Show simple item record

dc.contributor.advisorGuyot, Romain
dc.contributor.advisorIsaza Echeverri, Gustavo Adolfo
dc.contributor.authorOrozco Arias, Simon
dc.date.accessioned2022-05-03T22:58:23Z
dc.date.available2022-05-03T22:58:23Z
dc.date.issued2022-05-04
dc.identifier.urihttps://repositorio.ucaldas.edu.co/handle/ucaldas/17590
dc.descriptionIlustracionesspa
dc.description.abstractspa:Esta tesis doctoral se ha centrado en la aplicación de técnicas de machine learning y deep learning para el estudio de los LTR retrotransposones, con el objetivo de mejorar la comprensión a nivel genómico de plantas de interés agroindustrial como el arroz, el maíz, el café y la caña de azúcar, y que podría aplicarse a cualquier otro genoma vegetal u otros organismos. Investigaciones recientes han demostrado el impacto de los elementos transponibles en el fenotipo de cultivos de interés, como el color de los granos de maíz, el color y el sabor de las naranjas, el color de la piel de las patatas, el tamaño y la forma de los tomates, y el color y el sabor de las uvas, que se producen por la inserción de estos elementos cerca o dentro de los genes. Aunque existen técnicas y herramientas bioinformáticas para la detección y clasificación de los elementos transponibles, aún no es posible obtener resultados fiables, debido a la gran diversidad de sus estructuras, patrones de replicación y ciclos de vida. Además, estos componentes genómicos tienen características que hacen muy complejo su estudio, como la especificidad de las especies, la alta diversidad a nivel de nucleótidos (baja homología entre secuencias), las largas regiones no codificantes y su naturaleza repetitiva. Por ello, nuevas técnicas como el machine learning y el deep learning podrían mejorar el rendimiento tanto en el tiempo de ejecución como en la precisión de los resultados. En el desarrollo de este proyecto de investigación se utilizaron los algoritmos de aprendizaje automático más conocidos, así como algunas arquitecturas de redes neuronales profundas que se han generalizado en la comunidad científica en los últimos años. Se extrapolaron los métodos de extracción y selección de características, las técnicas de preprocesamiento, los algoritmos y las arquitecturas que se han utilizado con éxito en conjuntos de datos similares a los elementos transponibles. Asimismo, esta tesis doctoral tendrá un impacto positivo en la comunidad científica en los campos de la bioinformática, la genómica y la agricultura, ya que el software desarrollado aquí y su uso en otros genomas podría servir de base para futuras investigaciones relacionadas con la mejora genética, la comprensión de la evolución de las especies y la relación entre los organismos y el medio ambiente. Además, se generó conocimiento sobre el uso de nuevas técnicas en datos genómicos (especialmente LTR retrotransposones), como la influencia de la naturaleza de los datos en la precisión de los resultados, mejores técnicas de preprocesamiento (selección y extracción de características, reducción de la dimensionalidad, transformación de datos, entre otras), mejores hiperparámetros y métricas que se ajusten mejor a dichos elementos. Finalmente, esta propuesta de investigación condujo a la creación de un software bioinformático funcional que, gracias a las técnicas seleccionadas, permite la detección y clasificación de LTR retrotransposones en plantas de interés. Este software está disponible para la comunidad científica y puede ser utilizado en el contexto de varios proyectos masivos de secuenciación y ensamblaje de genomas, como el proyecto de los 3.000 genomas del arroz, la secuenciación de 10.000 genomas de plantas o el proyecto de secuenciación de 1,5 millones de especies eucariotas. Todos los códigos y scripts desarrollados durante este proyecto están disponibles en https://github.com/simonorozcoarias/MLinTEs.spa
dc.description.abstracteng:This PhD thesis focused on the application of machine learning and deep learning techniques for the study of LTR retrotransposons, with the aim of improving the understanding at the genomic level of plants of agro-industrial interest such as rice, maize, coffee and sugar cane, and which could be applied to any other plant genome or other organisms. Recent research has demonstrated the impact of transposable elements on the phenotype of crops of interest, such as the colour of maize kernels, the colour and flavor of oranges, the skin colour of potatoes, the size and shape of tomatoes, and the colour and flavor of grapes, which are produced by the insertion of these elements near or into genes. Although bioinformatics techniques and tools exist for the detection and classification of transposable elements, it is not yet possible to obtain reliable results, due to the great diversity of their structures, replication patterns and life cycles. In addition, these genomic components have characteristics that make their study very complex, such as species specificity, high diversity at the nucleotide level (low homology between sequences), long non-coding regions and their repetitive nature. Therefore, new techniques such as machine learning and deep learning could improve performance in terms of both execution time and accuracy of results. In the development of this research project, the most well-known machine learning algorithms were used, as well as some deep neural network architectures that have become widespread in the scientific community in recent years. Feature extraction and selection methods, pre-processing techniques, algorithms and architectures that have been successfully used on datasets similar to transposable features were extrapolated. Also, this Ph.D. thesis will have a positive impact on the scientific community in the fields of bioinformatics, genomics and agriculture, as the software developed here and its use on other genomes could serve as a basis for future research related to genetic improvement, understanding the evolution of species and the relationship between organisms and the environment. In addition, knowledge was generated on the use of new techniques on genomic data (especially LTR retrotransposons), such as the influence of the nature of the data on the accuracy of the results, better pre-processing techniques (feature selection and extraction, dimensionality reduction, data transformation, among others), and better hyper-parameters and metrics that better fit such elements. Finally, this research proposal led to the creation of a functional bioinformatics software that, thanks to the selected techniques, allows the detection and classification of LTR retrotransposons in plants of interest. This software is available to the scientific community and can be used in the context of several massive genome sequencing and assembly projects, such as the 3,000 rice genomes project, the sequencing of 10,000 plant genomes or the 1.5 million eukaryotic species sequencing project. All the codes and scripts developed during this project are available at https://github.com/simonorozcoarias/MLinTEs.eng
dc.description.tableofcontentsContents Acknowledgements / 1. Introduction / 1.1. Background / 1.2. Research problema / 1.3. Justi cation / 1.4. Research questions / 1.5. Research hypothesis / 1.6. Organization of this Document / 2. Thesis Objectives 11 2.1. General Objective /2.2. Speci c Objectives / 3. The State of the Art / 3.1. Context about retrotransposons and their characteristics / 3.2. Context about machine learning models in TEs / 3.3. Conclusions and perspectives / 4. DNA coding schemes and measuring metrics / 4.1. Context / 4.2. Conclusions and perspectives / 5. InpactorDB 20 5.1. Context / 5.2. Conclusions and perspectives / 6. K-mers-based-methods 23 6.1. Context / 6.2. Conclusions and perspectives / 7. Neural Network to curate LTR retrotransposons libraries 26 7.1. Context / 7.2. Conclusions and perspectives / 8. Inpactor2: A one-shot so ware based on deep learning / 8.1. Context / 8.2. Conclusions and perspectives / 9. Application of a DL-based tool to the identification and classification of LTR retrotransposons in the genus Co ea / 9.1. Abstract / 9.2. Introduction / 9.3. Materials and methods / 9.3.1. Co ea sequencing resources available / 9.3.2. Creation of co ee dataset for re-training Inpactor2 / 9.3.3. Library of LTR-RTs in Co ea genus and its annotation / 9.3.4. Data analysis and visualization / 9.3.5. Raw Illumina reads mapping results / 9.4. Results / 9.4.1. Re-training of the model for the Co ea genus / 9.4.2. Construction of a LTR-RT library for the Co ea genus / 9.4.3. Utilization of a Co ea LTR-RT library for the annotation of assemblies in the Co ea genus / 9.4.4. Relationship between the LTR-RT proportion and the genome size assembly / 9.5. Discussion / 9.6. Conclusion / Appendices / A. Appendix A / B. Appendix B / 10. Discussions, conclusions, and contributions / 10.1. Discussions / 10.1.1. DNA coding schemes and available datasets / 10.1.2. e detection problema / 10.1.3. Integration of ML models in a one-shot tool / 10.2. Conclusions / 10.3. Contributions / Bibliographyeng
dc.format.mimetypeapplication/pdfspa
dc.language.isoengspa
dc.language.isospaspa
dc.titleA computational architecture to identify and classify LTR retrotransposons in plant genomeseng
dc.typeTrabajo de grado - Doctoradospa
dc.contributor.researchgroupGITIR Grupo de Investigación en Tecnologías de la Información y Redes (Categoría A)spa
dc.description.degreelevelDoctoradospa
dc.identifier.instnameUniversidad de Caldasspa
dc.identifier.reponameRepositorio Institucional Universidad de Caldasspa
dc.identifier.repourlhttps://repositorio.ucaldas.edu.co/spa
dc.publisher.facultyFacultad de Ingenieríaspa
dc.publisher.placeManizalesspa
dc.relation.referencesF. Choulet, A. Alberti, S. eil, N. Glover, V. Barbe, J. Daron, L. Pingault, P. Sourdille, A. Couloux, E. Paux, and Others, “Structural and functional partitioning of bread wheat chromosome 3B,” Science, vol. 345, no. 6194, p. 1249721, 2014.spa
dc.relation.referencesE. Ibarra-Lacle e and E. Lyons, “Architecture and evolution of a minute plant genome,” Nature, vol. 498, no. 7452, pp. 1–6, 2013.spa
dc.relation.referencesM. I. Tenaillon, J. D. Hollister, and B. S. Gaut, “A triptych of the evolution of plant transposable elements,” Trends in Plant Science, vol. 15, no. 8, pp. 471–478, 2010.spa
dc.relation.referencesI. Makarevitch, A. J. Waters, P. T. West, M. Stitzer, C. N. Hirsch, J. Ross-Ibarra, and N. M. Springer, “Transposable Elements Contribute to Activation of Maize Genes in Response to Abiotic Stress,” PLoS Genetics, vol. 11, no. 1, 2015.spa
dc.relation.referencesE. Todorovska, “Retrotransposons and their Role in Plant-Genome Evolution,” Biotechnology & Biotechnological Equipment, vol. 2818, no. August, pp. 294–305, 2014.spa
dc.relation.referencesE. Casacuberta and J. Gonzalez, “ e impact of transposable elements in environmental ´ adaptation,” Molecular Ecology, vol. 22, no. 6, pp. 1503–1517, 2013.spa
dc.relation.referencesG. Bonchev and C. Parisod, “Transposable elements and microevolutionary changes in natural populations,” MOLECULAR ECOLOGY RESOURCES, vol. 13, pp. 765–775, sep 2013.spa
dc.relation.referencesS.-F. Li, T. Su, G.-Q. Cheng, B.-X. Wang, X. Li, C.-L. Deng, and W.-J. Gao, “Chromosome Evolution in Connection with Repetitive Sequences and Epigenetics in Plants,” GENES, vol. 8, oct 2017.spa
dc.relation.referencesS. Ou, J. Chen, and N. Jiang, “Assessing genome assembly quality using the LTR Assembly Index (LAI),” Nucleic Acids Research, no. August, pp. 1–11, 2018.spa
dc.relation.referencesD. Hermann, F. Egue, E. Tastard, D.-H. Nguyen, N. Casse, A. Caruso, S. Hiard, J. Marchand, B. Chenais, A. Morant-Manceau, and J. D. Rouault, “An introduction to the vast world of transposable elements - what about the diatoms?,” DIATOM RESEARCH, vol. 29, pp. 91–104, jan 2014spa
dc.relation.referencesF. Mascagni, A. Vangelisti, T. Giordani, A. Cavallini, and L. Natali, “Speci c LTRRetrotransposons Show Copy Number Variations between Wild and Cultivated Sun owers.,” Genes, vol. 9, p. 433, aug 2018spa
dc.relation.referencesT. Wicker, F. Sabot, A. Hua-Van, J. L. Bennetzen, P. Capy, B. Chalhoub, A. Flavell, P. Leroy, M. Morgante, O. Panaud, E. Paux, P. SanMiguel, and A. H. Schulman, “A uni ed classi cation system for eukaryotic transposable elements,” Nature Reviews Genetics, vol. 8, no. 12, pp. 973–982, 2007.spa
dc.relation.referencesP. S. Schnable, D. Ware, R. S. Fulton, J. C. Stein, F. Wei, S. Pasternak, C. Liang, J. Zhang, L. Fulton, T. A. Graves, P. Minx, A. D. Reily, L. Courtney, S. S. Kruchowski, C. Tomlinson, C. Strong, K. Delehaunty, C. Fronick, B. Courtney, S. M. Rock, E. Belter, F. Du, K. Kim, R. M. Abbo , M. Co on, A. Levy, P. Marche o, K. Ochoa, S. M. Jackson, B. Gillam, W. Chen, L. Yan, J. Higginbotham, M. Cardenas, J. Waligorski, E. Applebaum, L. Phelps, J. Falcone, K. Kanchi, T. ane, A. Scimone, N. ane, J. Henke, T. Wang, J. Ruppert, N. Shah, K. Ro er, J. Hodges, E. Ingenthron, M. Cordes, S. Kohlberg, J. Sgro, B. Delgado, K. Mead, A. Chinwalla, S. Leonard, K. Crouse, K. Collura, D. Kudrna, J. Currie, R. He, A. Angelova, S. Rajasekar, T. Mueller, R. Lomeli, G. Scara, A. Ko, K. Delaney, M. Wissotski, G. Lopez, D. Campos, M. Braido i, E. Ashley, W. Golser, H. Kim, S. Lee, J. Lin, Z. Dujmic, W. Kim, J. Talag, A. Zuccolo, C. Fan, A. Sebastian, M. Kramer, L. Spiegel, L. Nascimento, T. Zutavern, B. Miller, C. Ambroise, S. Muller, W. Spooner, A. Narechania, L. Ren, S. Wei, S. Kumari, B. Faga, M. J. Levy, L. McMahan, P. Van Buren, M. W. Vaughn, K. Ying, C.-T. Yeh, S. J. Emrich, Y. Jia, A. Kalyanaraman, A.-P. Hsia, W. B. Barbazuk, R. S. Baucom, T. P. Brutnell, N. C. Carpita, C. Chaparro, J.-M. Chia, J.-M. Deragon, J. C. Estill, Y. Fu, J. A. Jeddeloh, Y. Han, H. Lee, P. Li, D. R. Lisch, S. Liu, Z. Liu, D. H. Nagel, M. C. McCann, P. SanMiguel, A. M. Myers, D. Ne leton, J. Nguyen, B. W. Penning, L. Ponnala, K. L. Schneider, D. C. Schwartz, A. Sharma, C. Soderlund, N. M. Springer, Q. Sun, H. Wang, M. Waterman, R. Westerman, T. K. Wolfgruber, L. Yang, Y. Yu, L. Zhang, S. Zhou, Q. Zhu, J. L. Bennetzen, R. K. Dawe, J. Jiang, N. Jiang, G. G. Presting, S. R. Wessler, S. Aluru, R. A. Martienssen, S. W. Cli on, W. R. McCombie, R. A. Wing, and R. K. Wilson, “ e B73 Maize Genome: Complexity, Diversity, and Dynamics,” Science, vol. 326, no. 5956, pp. 1112–1115, 2009.spa
dc.relation.referencesA. H. Paterson, J. E. Bowers, R. Bruggmann, I. Dubchak, J. Grimwood, H. Gundlach, G. Haberer, U. Hellsten, T. Mitros, A. Poliakov, J. Schmutz, M. Spannagl, H. Tang, X. Wang, T. Wicker, A. K. Bharti, J. Chapman, F. A. Feltus, U. Gowik, I. V. Grigoriev, E. Lyons, C. a. Maher, M. Martis, A. Narechania, R. P. Otillar, B. W. Penning, A. a. Salamov, Y. Wang, L. Zhang, N. C. Carpita, M. Freeling, A. R. Gingle, C. T. Hash, B. Keller, P. Klein, S. Kresovich, M. C. McCann, R. Ming, D. G. Peterson, M. ur Rahman, D. Ware, P. Westho , K. F. X. Mayer, J. Messing, and D. S. Rokhsar, “ e Sorghum bicolor genome and the diversi cation of grasses.,” Nature, vol. 457, no. 7229, pp. 551–556, 2009.spa
dc.relation.referencesF. Denoeud, L. Carretero-Paulet, A. Dereeper, G. Droc, R. Guyot, M. Pietrella, C. Zheng, A. Alberti, F. Anthony, G. Aprea, J.-M. Aury, P. Bento, M. Bernard, S. Bocs, C. Campa, A. Cenci, M.-C. Combes, D. Crouzillat, C. Da Silva, L. Daddiego, F. De Bellis, S. Dussert, O. Garsmeur, T. Gayraud, V. Guignon, K. Jahn, V. Jamilloux, T. Joet, K. Labadie, T. Lan, J. Le- clercq, M. Lepelley, T. Leroy, L.-T. Li, P. Librado, L. Lopez, A. Munoz, B. Noel, A. Pallavicini, ˜ G. Perro a, V. Poncet, D. Pot, Priyono, M. Rigoreau, M. Rouard, J. Rozas, C. TranchantDubreuil, R. VanBuren, Q. Zhang, A. C. Andrade, X. Argout, B. Bertrand, A. de Kochko, G. Graziosi, R. J. Henry, Jayarama, R. Ming, C. Nagai, S. Rounsley, D. Sanko , G. Giuliano, V. a. Albert, P. Wincker, P. Lashermes, and Others, “ e co ee genome provides insight into the convergent evolution of ca eine biosynthesis,” science, vol. 345, no. 6201, pp. 1181–4, 2014.spa
dc.relation.referencesR. de Castro Nunes, S. Orozco-Arias, D. Crouzillat, L. A. Mueller, S. R. Strickler, P. Descombes, C. Fournier, D. Moine, A. de Kochko, P. M. Yuyama, A. L. L. Vanzela, and R. Guyot, “Structure and Distribution of Centromeric Retrotransposons at Diploid and Allotetraploid Co ea Centromeric and Pericentromeric Regions,” Frontiers in Plant Science, 2018spa
dc.relation.referencesC. M. Vicient and J. M. Casacuberta, “Impact of transposable elements on polyploid plant genomes,” ANNALS OF BOTANY, vol. 120, pp. 195–207, aug 2017.spa
dc.relation.referencesP. is, T. Lacombe, M. Cadle-Davidson, and C. L. Owens, “Wine grape (Vitis vinifera L.) color associates with allelic variation in the domestication gene VvmybA1,” eoretical and Applied Genetics, vol. 114, no. 4, pp. 723–730, 2007.spa
dc.relation.referencesH. Xiao, N. Jiang, E. Scha ner, E. J. Stockinger, and E. Van Der Knaap, “A retrotransposonmediated gene duplication underlies morphological variation of tomato fruit,” science, vol. 319, no. 5869, pp. 1527–1530, 2008spa
dc.relation.referencesM. Momose, Y. Abe, and Y. Ozeki, “Miniature inverted-repeat transposable elements of stowaway are active in potato,” Genetics, vol. 186, no. 1, pp. 59–66, 2010.spa
dc.relation.referencesE. Butelli, C. Licciardello, Y. Zhang, J. Liu, S. Mackay, P. Bailey, G. Reforgiato-Recupero, and C. Martin, “Retrotransposons control fruit-speci c, cold-dependent accumulation of anthocyanins in blood oranges.,” e Plant cell, vol. 24, pp. 1242–55, mar 2012.spa
dc.relation.referencesL. Wei and X. Cao, “ e e ect of transposable elements on phenotypic variation: insights from plants to humans,” Science China Life Sciences, vol. 59, pp. 24–37, jan 2016.spa
dc.relation.referencesC. Vi e, M.-A. Fustier, K. Alix, and M. I. Tenaillon, “ e bright side of transposons in crop evolution,” Brie ngs in Functional Genomics, vol. 13, no. 4, pp. 276–295, 2014.spa
dc.relation.referencesP. Baduel and V. Colot, “ e epiallelic potential of transposable elements and its evolutionary signi cance in plants,” Philosophical Transactions of the Royal Society B, vol. 376, no. 1826, p. 20200123, 2021.spa
dc.relation.referencesJ. Arango-Lopez, S. Orozco-Arias, J. A. Salazar, and R. Guyot, “Application of Data Mining ´ Algorithms to Classify Biological Data: e Co ea canephora Genome Case,” in Advances in Computing, vol. 735, pp. 156–170, Springer, 2017.spa
dc.relation.referencesL. Schietgat, C. Vens, R. Cerri, C. N. Fischer, E. Costa, J. Ramon, C. M. A. Carareto, and H. Blockeel, “A machine learning based framework to identify and classify long terminal repeat retrotransposons.,” PLoS computational biology, vol. 14, p. e1006097, apr 2018.spa
dc.relation.referencesT. Loureiro, N. Fonseca, and R. Camacho, Application of Machine Learning techniques on the Discovery and annotation of Transposons in genomes. Ms.c., Ms.C. esis FACULDADE DE ENGENHARIA, UNIVERSIDADE DO PORTO, 2012.spa
dc.relation.referencesM. Dupeyron, Dynamique et evolution de deux lign ´ ees remarquables de r ´ etrotransposons ´ a` LTR dans le genre Co ea (famille des Rubiacees) ´ . PhD thesis, Montpellier, 2017.spa
dc.relation.referencesK. Rawal and R. Ramaswamy, “Genome-wide analysis of mobile genetic element insertion sites,” Nucleic Acids Research, vol. 39, no. 16, pp. 6864–6878, 2011.spa
dc.relation.referencesR. N. Musta n and E. K. Khusnutdinova, “ e Role of Transposons in Epigenetic Regulation of Ontogenesis,” Russian Journal of Developmental Biology, vol. 49, pp. 61–78, mar 2018.spa
dc.relation.referencesW. Bao, K. K. Kojima, and O. Kohany, “Repbase Update, a database of repetitive elements in eukaryotic genomes,” Mobile DNA, vol. 6, no. 1, pp. 4–9, 2015.spa
dc.relation.referencesJ. Amselem, G. Cornut, N. Choisne, M. Alaux, F. Alfama-Depauw, V. Jamilloux, F. Maumus, T. Letellier, I. Luyten, C. Pommier, A. F. Adam-Blondon, and H. esneville, “RepetDB: A uni ed resource for transposable element references,” Mobile DNA, vol. 10, no. 1, pp. 1–9, 2019.spa
dc.relation.referencesM. Spannagl, T. Nussbaumer, K. C. Bader, M. M. Martis, M. Seidel, K. G. Kugler, H. Gundlach, and K. F. Mayer, “PGSB plantsDB: Updates to the database framework for comparative plant genome research,” Nucleic Acids Research, vol. 44, no. D1, pp. D1141–D1147, 2016.spa
dc.relation.referencesS. Orozco-Arias, P. A. Jaimes, M. S. Candamil, C. F. Jimenez-Var ´ on, R. Tabares-Soto, G. Isaza, ´ and R. Guyot, “InpactorDB: A classi ed lineage-level plant LTR retrotransposon reference library for free-alignment methods based on machine learning,” Genes, vol. 12, no. 2, pp. 1– 17, 2021.spa
dc.relation.referencesT. Loureiro, R. Camacho, J. Vieira, and N. A. Fonseca, “Improving the performance of Transposable Elements detection tools.,” Journal of integrative bioinformatics, vol. 10, no. 3, p. 231, 2013.spa
dc.relation.referencesS. Orozco-Arias, R. Tabares-Soto, D. Ceballos, and R. Guyot, “Parallel Programming in Biological Sciences, Taking Advantage of Supercomputing in Genomics,” in Advances in Computing (A. Solano and H. Ordonez, eds.), vol. 735, pp. 627–643, Zurich: Springer, 2017.spa
dc.relation.referencesS. Orozco-Arias, G. Isaza, R. Guyot, and R. Tabares-Soto, “A systematic review of the application of machine learning in the detection and classi cation of transposable elements,” PeerJ, vol. 7, p. e8311, 2019.spa
dc.relation.referencesR. Tabares-Soto, S. Orozco-Arias, V. Romero-Cano, V. S. Bucheli, J. L. Rodr´ıguez-Sotelo, and C. F. Jimenez-Var ´ on, “A comparative study of machine learning and deep learning algo- ´ rithms to classify cancer types based on microarray gene expression data,” PeerJ Computer Science, vol. 6, p. e270, 2020.spa
dc.relation.referencesM. W. Libbrecht and W. S. Noble, “Machine learning applications in genetics and genomics,” Nature Reviews Genetics, vol. 16, no. 6, pp. 321–332, 2015.spa
dc.relation.referencesP. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Arma ˜ nan- ˜ zas, G. Santafe, A. P ´ erez, and V. Robles, “Machine learning in bioinformatics,” ´ Brie ngs in Bioinformatics, vol. 7, no. 1, pp. 86–112, 2006.spa
dc.relation.referencesF. K. Nakano, S. M. Mastelini, S. Barbon, and R. Cerri, “Improving Hierarchical Classi cation of Transposable Elements using Deep Neural Networks,” in Proceedings of the International Joint Conference on Neural Networks, vol. 8-13 July, (Rio de Janeiro), IEEE, 2018.spa
dc.relation.referencesO. A. Montesinos-Lopez, A. Montesinos-L ´ opez, P. P ´ erez-Rodr ´ ´ıguez, J. A. Barron-L ´ opez, ´ J. W. Martini, S. B. Fajardo-Flores, L. S. Gaytan-Lugo, P. C. Santana-Mancilla, and J. Crossa, “A review of deep learning applications for genomic selection,” BMC Genomics, vol. 22, no. 1, pp. 1–23, 2021.spa
dc.relation.referencesM. H. P. da Cruz, P. T. M. Saito, A. R. Paschoal, and P. H. Buga i, “Classi cation of Transposable Elements by Convolutional Neural Networks,” in Lecture Notes in Computer Science, vol. 11509, pp. 157–168, Springer International Publishing, 2019.spa
dc.relation.referencesFAO, FIDA, OMS, PMA, and UNICEF, “LA SEGURIDAD ALIMENTARIA Y LA NUTRICION EN EL MUNDO,” tech. rep., ONU, Roma, 2020.spa
dc.relation.referencesONU, “Alimentacion,” 2018.spa
dc.relation.referencesC. A. Deutsch, J. J. Tewksbury, M. Tigchelaar, D. S. Ba isti, S. C. Merrill, R. B. Huey, and R. L. Naylor, “Increase in crop losses to insect pests in a warming climate,” Science, vol. 361, no. 6405, pp. 916–919, 2018.spa
dc.relation.referencesR. Tito, H. L. Vasconcelos, and K. J. Feeley, “Global Climate Change Increases Risk of Crop Yield Losses and Food Insecurity in the Tropical Andes,” Global Change Biology, vol. 24, no. 2, 2017spa
dc.relation.referencesN. Jiang, “Overview of Repeat Annotation and De Novo Repeat Identi cation,” in Plant Transposable Elements, pp. 275–287, Springer, 2013.spa
dc.relation.referencesG. Abrusan, N. Grundmann, L. Demester, and W. Makalowski, “TEclass - A tool for auto- ´ mated classi cation of unknown eukaryotic transposable elements,” Bioinformatics, vol. 25, no. 10, pp. 1329–1330, 2009.spa
dc.relation.referencesG. Eraslan, Z. Avsec, J. Gagneur, and F. J. eis, “Deep learning: new computational mode- ˇ lling techniques for genomics,” Nature Reviews Genetics, vol. 20, no. 7, pp. 389–403, 2019.spa
dc.relation.referencesT. Yue and H. Wang, “Deep learning for genomics: A concise overview,” arXiv preprint arXiv:1802.00810, 2018.spa
dc.relation.referencesJ. Zou, M. Huss, A. Abid, P. Mohammadi, A. Torkamani, and A. Telenti, “A primer on deep learning in genomics,” Nature Genetics, vol. 51, no. 1, pp. 12–18, 2019.spa
dc.relation.referencesL. Koumakis, “Deep learning models in genomics; are we there yet?,” Computational and Structural Biotechnology Journal, vol. 18, pp. 1466–1473, 2020.spa
dc.relation.referencesM. H. P. da Cruz, D. S. Domingues, P. T. M. Saito, A. R. Paschoal, and P. H. Buga i, “TERL: classi cation of transposable elements by convolutional neural networks,” Brie ngs in Bioinformatics, vol. 22, may 2021spa
dc.relation.referencesH. Yan, A. Bombarely, and S. Li, “DeepTE: a computational method for de novo classi cation of transposons with convolutional neural network.,” Bioinformatics (Oxford, England), 2020.spa
dc.relation.referencesS. Orozco-Arias, M. S. Candamil-Cortes, P. A. Jaimes, E. Valencia-Castrillon, R. TabaresSoto, R. Guyot, and G. Isaza, “Deep neural network to curate ltr retrotransposon libraries from plant genomes,” in International Conference on Practical Applications of Computational Biology & Bioinformatics, pp. 85–94, Springer, 2021.spa
dc.relation.referencesN.-S. Kim, “ e genomes and transposable elements in plants: are they friends or foes?,” GENES & GENOMICS, vol. 39, pp. 359–370, apr 2017.spa
dc.relation.referencesG. Usai, F. Mascagni, L. Natali, T. Giordani, and A. Cavallini, “Comparative genome-wide analysis of repetitive DNA in the genus Populus L.,” Tree Genetics & Genomes, vol. 13, p. 96, oct 2017spa
dc.relation.referencesC. R. L. Huang, K. H. Burns, and J. D. Boeke, “Active transposition in genomes.,” Annual review of genetics, vol. 46, pp. 651–75, dec 2012.spa
dc.relation.referencesA. Testori, L. Caizzi, S. Cutrupi, O. Friard, M. De Bortoli, D. Cora, and M. Caselle, “ e role of transposable elements in shaping the combinatorial interaction of transcription factors,” BMC genomics, vol. 13, no. 1, pp. 1–16, 2012.spa
dc.relation.referencesM.-A. A. Grandbastien, “LTR retrotransposons, handy hitchhikers of plant regulation and stress response,” Biochimica et Biophysica Acta - Gene Regulatory Mechanisms, vol. 1849, pp. 403–416, apr 2015spa
dc.relation.referencesN. Krom and W. Ramakrishna, “Retrotransposon insertions in rice gene pairs associated with reduced conservation of gene pairs in grass genomes.,” Genomics, vol. 99, pp. 308–14, may 2012.spa
dc.relation.referencesJ. Lee, N. E. Waminal, H.-I. Choi, S. Perumal, S.-C. Lee, V. B. Nguyen, W. Jang, N.-H. Kim, L.-Z. Gao, and T.-J. Yang, “Rapid ampli cation of four retrotransposon families promoted speciation and genome size expansion in the genus Panax.,” Scienti c reports, vol. 7, p. 9045, aug 2017.spa
dc.relation.referencesM. Elbaidouri and O. Panaud, “Genome-Wide Analysis of Transposition Using Next Generation Sequencing Technologies,” in Plant Transposable Elements, pp. 59–70, Springer, 2012.spa
dc.relation.referencesL. Wang, Y. He, H. Qiu, J. Guo, M. Han, J. Zhou, Q. Sun, and J. Sun, “Mdoryco1-1, a bidirectionally transcriptional Ty1-copia retrotransposon from Malus x domestica,” SCIENTIA HORTICULTURAE, vol. 220, pp. 283–290, jun 2017.spa
dc.relation.referencesR. C. Paz, M. E. Kozaczek, H. G. Rosli, N. P. Andino, and M. V. Sanchez-Puerta, “Diversity, distribution and dynamics of full-length Copia and Gypsy LTR retroelements in Solanum lycopersicum.,” Genetica, vol. 145, pp. 417–430, oct 2017.spa
dc.relation.referencesM. Iquebal, S. Jaiswal, C. Mukhopadhyay, C. Sarkar, A. Rai, and D. Kumar, “Applications of bioinformatics in plant and agriculture,” in PlantOmics: e Omics of Plant Science, pp. 755– 789, Springer, 2015.spa
dc.relation.referencesH. Z. Girgis, “Red: An intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale,” BMC Bioinformatics, vol. 16, no. 1, pp. 1–19, 2015.spa
dc.relation.referencesG. I. Arabidopsis, S. Kaul, H. L. Koo, J. Jenkins, M. Rizzo, T. Rooney, L. J. Tallon, T. Feldblyum, W. Nierman, M. I. Benito, X. Lin, and Others, “Analysis of the genome sequence of the owering plant Arabidopsis thaliana,” Nature, vol. 408, no. December, pp. 796–815, 2000.spa
dc.relation.referencesJ. Yu, S. Hu, J. Wang, G. K. Wong, S. Li, B. Liu, Y. Deng, L. Dai, Y. Zhou, X. Zhang, M. Cao, J. Liu, J. Sun, J. Tang, Y. Chen, X. Huang, W. Lin, C. Ye, W. Tong, L. Cong, J. Geng, Y. Han, L. Li, W. Li, G. Hu, J. Li, Z. Liu, Q. Qi, T. Li, X. Wang, H. Lu, T. Wu, M. Zhu, P. Ni, H. Han, W. Dong, X. Ren, X. Feng, P. Cui, X. Li, H. Wang, X. Xu, W. Zhai, Z. Xu, J. Zhang, S. He, J. Xu, K. Zhang, X. Zheng, J. Dong, W. Zeng, L. Tao, J. Ye, J. Tan, X. Chen, J. He, D. Liu, W. Tian, C. Tian, H. Xia, Q. Bao, G. Li, H. Gao, T. Cao, W. Zhao, P. Li, W. Chen, Y. Zhang, J. Hu, S. Liu, J. Yang, G. Zhang, Y. Xiong, Z. Li, L. Mao, C. Zhou, Z. Zhu, R. Chen, B. Hao, W. Zheng, S. Chen, W. Guo, M. Tao, L. Zhu, L. Yuan, and H. Yang, “A dra sequence of the rice genome (Oryza sativa L. ssp. indica),” Science, vol. 296, no. 5565, pp. 79–92, 2002.spa
dc.relation.referencesR. Akakpo, M.-C. Carpentier, Y. Ie Hsing, and O. Panaud, “ e impact of transposable elements on the structure, evolution and function of the rice genome,” New Phytologist, vol. 226, no. 1, pp. 44–49, 2020.spa
dc.relation.referencesM. Dom´ınguez, E. Dugas, M. Benchouaia, B. Leduque, J. M. Jimenez-G ´ omez, V. Colot, and ´ L. adrana, “ e impact of transposable elements on tomato diversity,” Nature communications, vol. 11, no. 1, pp. 1–11, 2020.spa
dc.relation.referencesD. Almojil, Y. Bourgeois, M. Falis, I. Hariyani, J. Wilcox, and S. Boissinot, “ e structural, functional and evolutionary impact of transposable elements in eukaryotes,” Genes, vol. 12, no. 6, p. 918, 2021.spa
dc.relation.referencesL. Sun, Y. Jing, X. Liu, Q. Li, Z. Xue, Z. Cheng, D. Wang, H. He, and W. Qian, “Heat stressinduced transposon activation correlates with 3d chromatin organization rearrangement in arabidopsis,” Nature communications, vol. 11, no. 1, pp. 1–13, 2020.spa
dc.relation.referencesS. A. Montgomery, Y. Tanizawa, B. Galik, N. Wang, T. Ito, T. Mochizuki, S. Akimcheva, J. L. Bowman, V. Cognat, L. Marechal-Drouard, ´ et al., “Chromatin organization in early land plants reveals an ancestral association between h3k27me3, transposons, and constitutive heterochromatin,” Current Biology, vol. 30, no. 4, pp. 573–588, 2020spa
dc.relation.referencesS. Alseekh, F. Scossa, and A. R. Fernie, “Mobile transposable elements shape plant genome diversity,” Trends in Plant Science, vol. 25, no. 11, pp. 1062–1064, 2020.spa
dc.relation.referencesS. Pimpinelli and L. Piacentini, “Environmental change and the evolution of genomes: Transposable elements as translators of phenotypic plasticity into genotypic variability,” Functional Ecology, vol. 34, no. 2, pp. 428–441, 2020.spa
dc.relation.referencesS. Orozco-arias, J. Liu, R. T.-s. Id, D. Ceballos, D. Silva, D. Id, R. Ming, and R. Guyot, “Inpactor, Integrated and Parallel Analyzer and Classi er of LTR Retrotransposons and Its Application for Pineapple LTR Retrotransposons Diversity and Dynamics,” Biology, 2018.spa
dc.relation.referencesL. van Dorp, C. J. Houldcro , D. Richard, and F. Balloux, “Covid-19, the rst pandemic in the post-genomic era,” Current Opinion in Virology, 2021.spa
dc.relation.referencesT. Flutre, E. Duprat, C. Feuillet, and H. esneville, “Considering transposable element diversi cation in de novo annotation approaches,” PloS one, vol. 6, no. 1, p. e16526, 2011.spa
dc.relation.referencesS. Orozco-Arias, G. Isaza, and R. Guyot, “Retrotransposons in plant genomes: structure, identi cation, and classi cation through bioinformatics and machine learning,” International journal of molecular sciences, vol. 20, no. 15, p. 3837, 2019.spa
dc.relation.referencesS. Ou, W. Su, Y. Liao, K. Chougule, J. R. Agda, A. J. Hellinga, C. S. B. Lugo, T. A. Ellio , D. Ware, T. Peterson, N. Jiang, C. N. Hirsch, and M. B. Hu ord, “Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline,” Genome Biology, vol. 20, no. 1, pp. 1–18, 2019.spa
dc.relation.referencesD. R. Hoen, G. Hickey, G. Bourque, J. Casacuberta, R. Cordaux, C. Fescho e, A.-S. FistonLavier, A. Hua-Van, R. Hubley, A. Kapusta, et al., “A call for benchmarking transposable element annotation methods,” Mobile DNA, vol. 6, no. 1, pp. 1–9, 2015.spa
dc.relation.referencesK. A. Shastry and H. Sanjay, “Machine learning for bioinformatics,” in Statistical modelling and machine learning principles for bioinformatics techniques, tools, and applications, pp. 25– 39, Springer, 2020.spa
dc.relation.referencesE. Naresh, B. V. Kumar, S. P. Shankar, et al., “Impact of machine learning in bioinformatics research,” in Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications, pp. 41–62, Springer, 2020.spa
dc.relation.referencesI.-C. Giassa and P. Alexiou, “Bioinformatics and machine learning approaches to understand the regulation of mobile genetic elements,” Biology, vol. 10, no. 9, p. 896, 2021.spa
dc.relation.referencesE. Routhier, A. Bin Kamruddin, and J. Mozziconacci, “keras dna: a wrapper for fast implementation of deep learning models in genomics,” Bioinformatics, vol. 37, no. 11, pp. 1593– 1594, 2021.spa
dc.relation.referencesW. Kopp, R. Monti, A. Tamburrini, U. Ohler, and A. Akalin, “Deep learning for genomics using janggu,” Nature communications, vol. 11, no. 1, pp. 1–7, 2020.spa
dc.relation.referencesA. Kashfeen and L. McMillan, “Frontier: nding the boundaries of novel transposable element insertions in genomes,” in Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, pp. 1–10, 2021spa
dc.relation.referencesM. Panta, A. Mishra, M. T. Hoque, and J. Atallah, “Classifyte: a stacking-based prediction of hierarchical classi cation of transposable elements,” Bioinformatics, 2021spa
dc.relation.referencesK. Riehl, C. Riccio, E. A. Miska, and M. Hemberg, “Transposonultimate: so ware for transposon classi cation, annotation and detection,” bioRxiv, 2021.spa
dc.relation.referencesS. Orozco-Arias, G. Isaza, R. Guyot, and R. Tabares-soto, “A systematic review of the application of machine learning in the detection and classi cation of transposable elements,” Peerj, vol. 7, p. 18311, 2019spa
dc.relation.referencesC. Ma, H. H. Zhang, and X. Wang, “Machine learning for Big Data analytics in plants,” Trends in Plant Science, vol. 19, no. 12, pp. 798–808, 2014.spa
dc.relation.referencesF. K. Nakano, W. J. Pinto, G. L. Pappa, and R. Cerri, “Top-down strategies for hierarchical classi cation of transposable elements with neural networks,” in Proceedings of the International Joint Conference on Neural Networks, vol. 2017-May, (Anchorage, AK, United states), pp. 2539–2546, 2017.spa
dc.relation.referencesE. A. Bell, C. L. Butler, C. Oliveira, S. Marburger, L. Yant, and M. I. Taylor, “Transposable element annotation in non-model species: e bene ts of species-speci c repeat libraries using semi-automated edta and deepte de novo pipelines,” Molecular Ecology Resources, 2021.spa
dc.relation.referencesT. Flutre, E. Permal, and H. esneville, “Transposable element annotation in completely sequenced eukaryote genomes,” in Plant Transposable Elements, pp. 17–39, Springer, 2012.spa
dc.relation.referencesC. Fescho e, N. Jiang, and S. R. Wessler, “Plant transposable elements: Where genetics meets genomics,” Nature Reviews Genetics, vol. 3, pp. 329–341, may 2002.spa
dc.relation.referencesJ. F. Pereira and P. R. Ryan, “ e role of transposable elements in the evolution of aluminium resistance in plants,” Journal of Experimental Botany, vol. 70, pp. 41–54, 10 2018.spa
dc.relation.referencesM. Sahebi, M. M. Hana , A. J. van Wijnen, D. Rice, M. Y. Ra i, P. Azizi, M. Osman, S. Taheri, M. F. A. Bakar, M. N. M. Isa, and Others, “Contribution of transposable elements in the plant’s genome,” Gene, vol. 665, pp. 155–166, 2018.spa
dc.relation.referencesB. McClintock, “ e Signi cance of Responses of the Genome to Challenge,” Science, vol. 226, no. 4676, pp. 792–801, 1984.spa
dc.relation.referencesV. Horvath, M. Merenciano, and J. Gonz ´ alez, “Revisiting the Relationship between Trans- ´ posable Elements and the Eukaryotic Stress Response,” Trends in Genetics, vol. 33, no. 11, pp. 832–841, 2017.spa
dc.relation.referencesC. A. omas, “THE GENETIC ORGANIZATION OF CHROMOSOMES,” Annual Review of Genetics, vol. 5, no. 1, pp. 237–256, 1971.spa
dc.relation.referencesT. P. Michael, “Plant genome size variation: bloating and purging DNA,” Brie ngs in Functional Genomics, vol. 13, pp. 308–317, 03 2014.spa
dc.relation.referencesX. Dai, H. Wang, H. Zhou, L. Wang, J. Dvo˚A™A¡k, J. L. Bennetzen, and H.-G. M ˜ A˜ ¼ller, “Birth and Death of LTR-Retrotransposons in Aegilops tauschii,” Genetics, vol. 210, pp. 1039–1051, 08 2018.spa
dc.relation.referencesS.-I. Lee and N.-S. Kim, “Transposable Elements and Genome Size Variations in Plants,” Genomics & Informatics, vol. 12, no. 3, p. 87, 2014.spa
dc.relation.referencesE. R. Havecker, X. Gao, and D. F. Voytas, “ e diversity of LTR retrotransposons,” Genome biology, vol. 5, no. 6, p. 225, 2004.spa
dc.relation.referencesJ. M. Casacuberta, S. Jackson, O. Panaud, M. Purugganan, and J. Wendel, “Evolution of Plant Phenotypes, from Genomes to Traits,” G3 Genes—Genomes—Genetics, vol. 6, pp. 775–778, 04 2016.spa
dc.relation.referencesC. M. Bergman and H. esneville, “Discovering and detecting transposable elements in genome sequences,” Brie ngs in Bioinformatics, vol. 8, no. 6, pp. 382–392, 2007.spa
dc.relation.referencesH. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, “Transfer learning for time series classi cation,” in 2018 IEEE International Conference on Big Data (Big Data), pp. 1367–1376, 2018.spa
dc.relation.referencesJ.-C. Charr, A. Garavito, C. Guyeux, D. Crouzillat, P. Descombes, C. Fournier, S. N. Ly, E. N. Raharimalala, J.-J. Rakotomalala, P. Sto elen, S. Janssens, P. Hamon, and R. Guyot, “Complex evolutionary history of co ees revealed by full plastid genomes and 28,800 nuclear SNP analyses, with particular emphasis on Co ea canephora (Robusta co ee),” Molecular Phylogenetics and Evolution, vol. 151, p. 106906, 2020.spa
dc.relation.referencesR. Guyot, T. Darre, M. Dupeyron, A. de Kochko, S. Hamon, E. Couturon, D. Crouzillat, ´ M. Rigoreau, J.-J. Rakotomalala, N. E. Raharimalala, S. D. Aka ou, and P. Hamon, “Partial sequencing reveals the transposable element composition of Co ea genomes and provides evidence for distinct evolutionary stories.,” Molecular genetics and genomics : MGG, vol. 291, pp. 1979–90, oct 2016.spa
dc.relation.referencesR. Guyot, P. Hamon, E. Couturon, N. Raharimalala, J.-J. Rakotomalala, S. Lakkanna, S. Sabatier, A. A ouard, and P. Bonnet, “WCSdb: a database of wild Co ea species,” Database, vol. 2020, 11 2020. baaa069.spa
dc.relation.referencesP. Lashermes, V. Paczek, P. Trouslot, M. Combes, E. Couturon, and A. Charrier, “Brief communication. Single-locus inheritance in the allotetraploid Co ea arabica L. and interspeci- c Hybrid C. arabica X C. canephora,” Journal of Heredity, vol. 91, pp. 81–85, 01 2000.spa
dc.relation.referencesP. Hamon, C. E. Grover, A. P. Davis, J.-J. Rakotomalala, N. E. Raharimalala, V. A. Albert, H. L. Sreenath, P. Sto elen, S. E. Mitchell, E. Couturon, S. Hamon, A. de Kochko, D. Crouzillat, M. Rigoreau, U. Sumirat, S. Aka ou, and R. Guyot, “Genotyping-by-sequencing provides the rst well-resolved phylogeny for co ee (Co ea) and insights into the evolution of caffeine content in its species: GBS co ee phylogeny and the evolution of ca eine content,” Molecular Phylogenetics and Evolution, vol. 109, pp. 351–361, 2017.spa
dc.relation.referencesN. Raharimalala, S. Rombauts, A. McCarthy, A. Garavito, S. Orozco-Arias, L. Bellanger, A. Y. Morales-Correa, S. Froger, S. Michaux, V. Berry, S. Metairon, C. Fournier, M. Lepelley, L. Mueller, E. Couturon, P. Hamon, J.-J. Rakotomalala, P. Descombes, R. Guyot, and D. Crouzillat, “ e absence of the ca eine synthase gene is involved in the naturally decaffeinated status of Co ea humblotiana, a wild species from Comoro archipelago,” Scienti c Reports, vol. 11, no. 1, pp. 1–14, 2021.spa
dc.relation.referencesJ.-C. Charr, A. Garavito, C. Guyeux, D. Crouzillat, P. Descombes, C. Fournier, S. N. Ly, E. N. Raharimalala, J.-J. Rakotomalala, P. Sto elen, et al., “Complex evolutionary history of co ees revealed by full plastid genomes and 28,800 nuclear snp analyses, with particular emphasis on co ea canephora (robusta co ee),” Molecular Phylogenetics and Evolution, vol. 151, p. 106906, 2020.spa
dc.relation.referencesP. Hamon, P. O. Duroy, C. Dubreuil-Tranchant, P. Mafra D’Almeida Costa, C. Duret, N. J. Raza narivo, E. Couturon, S. Hamon, A. De Kochko, V. Poncet, and R. Guyot, “Two novel Ty1-copia retrotransposons isolated from co ee trees can e ectively reveal evolutionary relationships in the Co ea genus (Rubiaceae),” Molecular Genetics and Genomics, vol. 285, no. 6, pp. 447–460, 2011.spa
dc.relation.referencesM. Dupeyron, R. F. de Souza, P. Hamon, A. de Kochko, D. Crouzillat, E. Couturon, D. S. Domingues, and R. Guyot, “Distribution of Divo in Co ea genomes, a poorly described family of angiosperm LTR-Retrotransposons,” Molecular Genetics and Genomics, vol. 292, pp. 741–754, aug 2017.spa
dc.relation.referencesA. V. Zimin, G. MarA˜ §ais, D. Puiu, M. Roberts, S. L. Salzberg, and J. A. Yorke, “ e MaSuRCA genome assembler,” Bioinformatics, vol. 29, pp. 2669–2677, 08 2013.spa
dc.relation.referencesM. Seppey, M. Manni, and E. M. Zdobnov, “BUSCO: Assessing genome assembly and annotation completeness,” in Methods in Molecular Biology, vol. 1962, pp. 227–245, Humana Press Inc., 2019.spa
dc.relation.referencesE. M. McCarthy and J. F. McDonald, “LTR STRUC: A novel search and identi cation program for LTR retrotransposons,” Bioinformatics, vol. 19, no. 3, pp. 362–367, 2003.spa
dc.relation.referencesS. Orozco-Arias, M. S. Candamil-Cortes, P. A. Jaimes, J. S. Pi ´ na, R. Tabares-Soto, R. Guyot, ˜ and G. Isaza, “K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes,” PeerJ, vol. 9, p. e11456, may 2021.spa
dc.relation.referencesN. Chen, “Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences,” Current Protocols in Bioinformatics, vol. 5, no. 1, pp. 4.10.1–4.10.14, 2004.spa
dc.relation.referencesR. C. Team, “R: a language and environment for statistical 688 computing,” Vienna: R Foundation, 2016.spa
dc.relation.referencesI. Letunic and P. Bork, “Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation,” Nucleic Acids Research, vol. 49, pp. W293–W296, 04 2021.spa
dc.relation.referencesB. Langmead and S. L. Salzberg, “Fast gapped-read alignment with Bowtie 2,” Nature methods, vol. 9, no. 4, p. 357, 2012spa
dc.relation.referencesN. J. Raza narivo, J. J. Rakotomalala, S. C. Brown, M. Bourge, S. Hamon, A. de Kochko, V. Poncet, C. Dubreuil-Tranchant, E. Couturon, R. Guyot, and P. Hamon, “Geographical gradients in the genome size variation of wild co ee trees (Co ea) native to Africa and Indian Ocean islands,” Tree Genetics and Genomes, vol. 8, no. 6, pp. 1345–1358, 2012.spa
dc.relation.referencesC. E. Grover and J. F. Wendel, “Recent Insights into Mechanisms of Genome Size Change in Plants,” Journal of Botany, vol. 2010, pp. 1–8, 2010.spa
dc.relation.referencesR. J. Schley, J. Pellicer, X.-J. Ge, C. Barre , S. Bellot, M. S. Guignard, P. Novak, J. Suda, ´ D. Fraser, W. J. Baker, S. Dodsworth, J. r´ıMacas, A. R. Leitch, and I. J. Leitch, “ e Ecology of Palm Genomes: Repeat-associated genome size expansion is constrained by aridity,” bioRxiv, 2021.spa
dc.relation.referencesK. Y. Yip, C. Cheng, and M. Gerstein, “Machine learning and genome annotation: a match meant to be?,” Genome biology, vol. 14, no. 5, pp. 1–10, 2013.spa
dc.relation.referencesC. Xu and S. A. Jackson, “Machine learning and complex biological data,” 2019spa
dc.relation.referencesN. Yu, X. Guo, F. Gu, and Y. Pan, “Dna as x: An information-coding-based model to improve the sensitivity in comparative gene analysis,” in International Symposium on Bioinformatics Research and Applications, pp. 366–377, Springer, 2015.spa
dc.relation.referencesM. Akhtar, J. Epps, and E. Ambikairajah, “Signal processing in sequence analysis: advances in eukaryotic gene prediction,” IEEE journal of selected topics in signal processing, vol. 2, no. 3, pp. 310–321, 2008.spa
dc.relation.referencesG. Kauer and H. Blocker, “Applying signal theory to the analysis of biomolecules,” ¨ Bioinformatics, vol. 19, no. 16, pp. 2016–2021, 2003.spa
dc.relation.referencesG. L. Rosen, Signal processing for biologically-inspired gradient source localization and DNA sequence analysis. Georgia Institute of Technology, 2006.spa
dc.relation.referencesA. C. H. Choong and N. K. Lee, “Evaluation of convolutionary neural networks modeling of dna sequences using ordinal versus one-hot encoding method,” in 2017 International Conference on Computer and Drone Applications (IConDA), pp. 60–65, IEEE, 2017.spa
dc.relation.referencesD. Ceballos, D. Lopez- ´ Alvarez, G. Isaza, R. Tabares-Soto, S. Orozco-Arias, and C. D. Fe- ´ rrin, “A machine learning-based pipeline for the classi cation of ctx-m in metagenomics samples,” Processes, vol. 7, no. 4, p. 235, 2019.spa
dc.relation.referencesZ. Lv, H. Ding, L. Wang, and Q. Zou, “A convolutional neural network using dinucleotide one-hot encoder for identifying dna n6-methyladenine sites in the rice genome,” Neurocomputing, vol. 422, pp. 214–221, 2021.spa
dc.relation.referencesF. Wang, P. Chainani, T. White, J. Yang, Y. Liu, and B. Soibam, “Deep learning identi- es genome-wide dna binding sites of long noncoding rnas,” RNA biology, vol. 15, no. 12, pp. 1468–1476, 2018.spa
dc.relation.referencesD. R. Kelley, Y. A. Reshef, M. Bileschi, D. Belanger, C. Y. McLean, and J. Snoek, “Sequential regulatory activity prediction across chromosomes with convolutional neural networks,” Genome research, vol. 28, no. 5, pp. 739–750, 2018.spa
dc.relation.referencesD. Mapleson, G. Garcia Accinelli, G. Ke leborough, J. Wright, and B. J. Clavijo, “Kat: a kmer analysis toolkit to quality control ngs datasets and genome assemblies,” Bioinformatics, vol. 33, no. 4, pp. 574–576, 2017.spa
dc.relation.referencesF. P. Breitwieser, D. Baker, and S. L. Salzberg, “Krakenuniq: con dent and fast metagenomics classi cation using unique k-mer counts,” Genome biology, vol. 19, no. 1, pp. 1–10, 2018.spa
dc.relation.referencesD. R. Zerbino and E. Birney, “Velvet: algorithms for de novo short read assembly using de bruijn graphs,” Genome research, vol. 18, no. 5, pp. 821–829, 2008.spa
dc.relation.referencesJ. T. Simpson, K. Wong, S. D. Jackman, J. E. Schein, S. J. Jones, and I. Birol, “Abyss: a parallel assembler for short read sequence data,” Genome research, vol. 19, no. 6, pp. 1117–1123, 2009.spa
dc.relation.referencesH. Sun, J. Ding, M. Piednoel, and K. Schneeberger, “ ndgse: estimating genome size varia- ¨ tion within human and arabidopsis using k-mer frequencies,” Bioinformatics, vol. 34, no. 4, pp. 550–557, 2018.spa
dc.relation.referencesA. L. Price, N. C. Jones, and P. A. Pevzner, “De novo identi cation of repeat families in large genomes,” Bioinformatics, vol. 21, no. suppl 1, pp. i351–i358, 2005.spa
dc.relation.referencesB. Z. Santos, G. T. Pereira, F. K. Nakano, and R. Cerri, “Strategies for selection of positive and negative instances in the hierarchical classi cation of transposable elements,” in 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 420–425, IEEE, 2018.spa
dc.relation.referencesW. Ashlock and S. Da a, “Distinguishing endogenous retroviral ltrs from sine elements using features extracted from evolved side e ect machines,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 6, pp. 1676–1689, 2012.spa
dc.relation.referencesF. Liu, H. Li, C. Ren, X. Bo, and W. Shu, “Pedla: predicting enhancers with a deep learningbased algorithmic framework,” Scienti c reports, vol. 6, no. 1, pp. 1–14, 2016.spa
dc.relation.referencesJ. T. Cuperus, B. Groves, A. Kuchina, A. B. Rosenberg, N. Jojic, S. Fields, and G. Seelig, “Deep learning of the regulatory grammar of yeast 5’ untranslated regions from 500, 000 random sequences,” Genome research, vol. 27, no. 12, pp. 2015–2024, 2017.spa
dc.relation.referencesR. S. Roy, D. Bha acharya, and A. Schliep, “Turtle: Identifying frequent k-mers with cachee cient algorithms,” Bioinformatics, vol. 30, no. 14, pp. 1950–1957, 2014.spa
dc.relation.referencesL. Pellegrina, C. Pizzi, and F. Vandin, “Fast approximation of frequent k-mers and applications to metagenomics,” Journal of Computational Biology, vol. 27, no. 4, pp. 534–549, 2020.spa
dc.relation.referencesP. Melsted and J. K. Pritchard, “E cient counting of k-mers in dna sequences using a bloom lter,” BMC bioinformatics, vol. 12, no. 1, pp. 1–7, 2011.spa
dc.relation.referencesF. Doshi-Velez and B. Kim, “Considerations for evaluation and generalization in interpretable machine learning,” in Explainable and interpretable models in computer vision and machine learning, pp. 3–17, Springer, 2018spa
dc.relation.referencesM. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of machine learning. MIT press, 2018.spa
dc.relation.referencesS.-S. Zhou, X.-M. Yan, K.-F. Zhang, H. Liu, J. Xu, S. Nie, K.-H. Jia, S.-Q. Jiao, W. Zhao, Y.-J. Zhao, et al., “A comprehensive annotation dataset of intact ltr retrotransposons of 300 plant genomes,” Scienti c Data, vol. 8, no. 1, pp. 1–9, 2021.spa
dc.relation.referencesS. Ou and N. Jiang, “Ltr retriever: a highly accurate and sensitive program for identi cation of long terminal repeat retrotransposons,” Plant physiology, vol. 176, no. 2, pp. 1410–1422, 2018.spa
dc.relation.referencesE. Lerat, “Identifying repeats and transposable elements in sequenced genomes: how to nd your way through the dense forest of programs,” Heredity, vol. 104, no. 6, pp. 520–533, 2010.spa
dc.relation.referencesF. M. You, S. Cloutier, Y. Shan, and R. Ragupathy, “Ltr annotator: automated identi cation and annotation of ltr retrotransposons in plant genomes,” International Journal of Bioscience, Biochemistry and Bioinformatics, vol. 5, no. 3, p. 165, 2015.spa
dc.relation.referencesA. C. Wacholder, C. Cox, T. J. Meyer, R. P. Ruggiero, V. Vemulapalli, A. Damert, L. Carbone, and D. D. Pollock, “Inference of transposable element ancestry,” PLoS genetics, vol. 10, no. 8, p. e1004482, 2014.spa
dc.relation.referencesP. Neumann, P. Novak, N. Ho ´ stˇ akov ´ a, and J. Macas, “Systematic survey of plant ltr- ´ retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classi cation,” Mobile DNA, vol. 10, no. 1, pp. 1–17, 2019.spa
dc.relation.referencesN. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of arti cial intelligence research, vol. 16, pp. 321–357, 2002.spa
dc.relation.referencesH. He, Y. Bai, E. A. Garcia, and S. Li, “Adasyn: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), pp. 1322–1328, IEEE, 2008.spa
dc.relation.referencesZ. Xu and H. Wang, “Ltr nder: an e cient tool for the prediction of full-length ltr retrotransposons,” Nucleic acids research, vol. 35, no. suppl 2, pp. W265–W268, 2007.spa
dc.relation.referencesS. Ou and N. Jiang, “Ltr nder parallel: parallelization of ltr nder enabling rapid identi cation of long terminal repeat retrotransposons,” Mobile DNA, vol. 10, no. 1, pp. 1–3, 2019.spa
dc.relation.referencesG. Chandan, A. Jain, H. Jain, et al., “Real time object detection and tracking using deep learning and opencv,” in 2018 International Conference on inventive research in computing applications (ICIRCA), pp. 1305–1308, IEEE, 2018.spa
dc.relation.referencesA. E. Wahabi, I. H. Baraka, S. Hamdoune, and K. E. Mokhtari, “Detection and control system for automotive products applications by arti cial vision using deep learning,” in International Conference on Advanced Intelligent Systems for Sustainable Development, pp. 224–241, Springer, 2019.spa
dc.relation.referencesA. Raghunandan, P. Raghav, H. R. Aradhya, et al., “Object detection algorithms for video surveillance applications,” in 2018 International Conference on Communication and Signal Processing (ICCSP), pp. 0563–0568, IEEE, 2018.spa
dc.relation.referencesJ. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Uni ed, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pa ern recognition, pp. 779–788, 2016.spa
dc.relation.referencesD. Ellinghaus, S. Kurtz, and U. Willhoe , “Ltrharvest, an e cient and exible so ware for de novo detection of ltr retrotransposons,” BMC bioinformatics, vol. 9, no. 1, pp. 1–14, 2008.spa
dc.relation.referencesJ. D. Valencia and H. Z. Girgis, “Ltrdetector: a tool-suite for detecting long terminal repeat retrotransposons de-novo,” BMC genomics, vol. 20, no. 1, pp. 1–14, 2019.spa
dc.relation.referencesM. Biryukov and K. Ustyantsev, “Darts: An algorithm for domain-associated retrotransposon search in genome assemblies,” Genes, vol. 13, no. 1, 2022.spa
dc.relation.referencesH. Jung, M.-S. Jeon, M. Hodge , P. Waterhouse, and S.-i. Eyun, “Comparative evaluation of genome assemblers from long-read sequencing for plants and crops,” Journal of Agricultural and Food Chemistry, vol. 68, no. 29, pp. 7670–7677, 2020. PMID: 32530283.spa
dc.relation.referencesY. Chernyavskaya, X. Zhang, J. Liu, and J. Blackburn, “Long-read sequencing of the zebra sh genome reorganizes genomic architecture,” BMC Genomics, vol. 23, no. 1, pp. 1–13, 2022.spa
dc.relation.referencesY. Suzuki and S. Morishita, “ e time is ripe to investigate human centromeres by long-read sequencing,” DNA Research, vol. 28, 10 2021. dsab021.spa
dc.relation.referencesY. Jiang, Repetitive DNA sequence assembly. PhD thesis, Deakin University, 2017.spa
dc.relation.referencesT. J. Treangen and S. L. Salzberg, “Repetitive DNA and next-generation sequencing: Computational challenges and solutions,” Nature Reviews Genetics, vol. 13, no. 1, pp. 36– 46, 2012.spa
dc.relation.referencesS. Lian, Y. Tu, Y. Wang, X. Chen, and L. Wang, “A repetitive sequence assembler based on next-generation sequencing,” Genetics and Molecular Research, vol. 15, no. 3, pp. 1–13, 2016.spa
dc.relation.referencesM. Zytnicki, E. Akhunov, and H. esneville, “ Tedna: a transposable element de novo assembler ,” Bioinformatics, vol. 30, pp. 2656–2658, 06 2014.spa
dc.relation.referencesC. Chu, R. Nielsen, and Y. Wu, “REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads,” PLOS ONE, vol. 11, no. 3, pp. 1–17, 2016.spa
dc.relation.referencesR. M. Nowak, “Genome Assembler for Repetitive Sequences,” in Information Technologies in Biomedicine (E. Piketka and J. Kawa, eds.), (Berlin, Heidelberg), pp. 422–429, Springer Berlin Heidelberg, 2012.spa
dc.relation.referencesE. Bao, F. Xie, C. Song, and D. Song, “FLAS: fast and high-throughput algorithm for PacBio long-read self-correction,” Bioinformatics, vol. 35, pp. 3953–3960, 03 2019.spa
dc.relation.referencesE. L. van Dijk, Y. Jaszczyszyn, D. Naquin, and C. ermes, “ e ird Revolution in Sequencing Technology,” Trends in Genetics, vol. 34, no. 9, pp. 666–681, 2018.spa
dc.relation.referencesH. Jung, C. Wine eld, A. Bombarely, P. Prentis, and P. Waterhouse, “Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes,” Trends in Plant Science, vol. 24, no. 8, pp. 700–724, 2019.spa
dc.relation.referencesS. Shahid and R. K. Slotkin, “ e current revolution in transposable element biology enabled by long reads,” Current Opinion in Plant Biology, vol. 54, pp. 49–56, 2020.spa
dc.relation.referencesR.-G. Zhang, Z.-X. Wang, S. Ou, and G.-Y. Li, “TEsorter: lineage-level classi cation of transposable elements using conserved protein domains,” bioRxiv, 2019.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.subject.proposalLTR retrotransposonseng
dc.subject.proposalMachine Learningeng
dc.subject.proposalDetectioneng
dc.subject.proposalClassificationeng
dc.subject.proposalGenomic Object Detectioneng
dc.subject.proposalK-mer based methodeng
dc.subject.proposalNeural networkseng
dc.subject.unescoAgricultura
dc.subject.unescoInteligencia artificial
dc.type.coarhttp://purl.org/coar/resource_type/c_db06spa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/doctoralThesisspa
dc.type.versioninfo:eu-repo/semantics/publishedVersionspa
oaire.versionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa
dc.description.degreenameDoctor(a) en Ingenieríaspa
dc.publisher.programDoctorado en Ingenieríaspa
dc.description.researchgroupLínea de Investigación en modelos biocomputacionales y bioinformáticaspa
dc.rights.coarhttp://purl.org/coar/access_right/c_abf2spa


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record