BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.
20Sequencing technology and assembly algorithms have matured to the point that high-21 quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse 22 transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for 23 each class of elements with unknown relative performance metrics. We benchmarked existing 24 programs based on a curated library of rice TEs. Using the most robust programs, we created a 25 comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a 26 condensed TE library for annotations of structurally intact and fragmented elements. EDTA is 27 open-source and freely available: https://github.com/oushujun/EDTA. 28 Keywords 29 Transposable element; Annotation; Genome; Benchmarking; Pipeline 30 31Long-read sequencing (e.g., PacBio and Oxford Nanopore) and assembly scaffolding 50 (e.g., Hi-C and BioNano) techniques have progressed rapidly within the last few years. These 51 innovations have been critical for high-quality assembly of the repetitive fraction of genomes. In 52 fact, Ou et al. [8] demonstrated that the assembly contiguity of repetitive sequences in recent 53 long-read assemblies is even better than traditional BAC-based reference genomes. With these 54 developments, inexpensive and high-quality assembly of an entire genome is now possible. 55Knowing where features (i.e., genes, TEs, etc.) exist in a genome assembly is important 56 4 information for using these assemblies for biological findings. However, unlike the relatively 57 straightforward and comprehensive pipelines established for gene annotation [9][10][11], current 58 methods for TE annotation can be piecemeal, inaccurate, and are highly specific to classes of 59 transposable elements. 60Transposable elements fall into two major classes. Class I elements, also known as 61 retrotransposons, use an RNA intermediate in their "copy and paste" mechanism of 62 transposition [12]. Class I elements can be further divided into long terminal repeat (LTR) 63 retrotransposons, as well as those that lack LTRs (non-LTRs), which include long interspersed 64 nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). Structural features 65 of these elements can facilitate automated de novo annotation in a genome assembly. For 66 example, LTR elements have a 5-bp target site duplication (TSD), while non-LTRs have either 67 variable length TSDs or lack TSDs entirely, being instead associated with deletion of flanking 68 sequences upon insertion [13]. There are also standard terminal sequences associated with 69 LTR elements (i.e., 5'-TG…C/G/TA-3' for LTR-Copia and 5'-TG…CA-3' for LTR-Gypsy 70 elements), and non-LTRs often have a terminal poly-A tail at the 3' end of the element (see [14] 71 for a complete description of structural features of each superfamily). 72The second major class of TEs, Class II elements, also known as DNA transposons, use 73 a DNA intermediate in their "cut and paste" mechanism of transposition [15]. As with Class I 74...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.