Kapeel Chougule scite author profile

BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

show abstract

Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza

Stein

et al. 2018

View full text Add to dashboard Cite

De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes

Hufford

Seetharam

Woodhouse

et al. 2021

Science

408

487

View full text Add to dashboard Cite

We report de novo genome assemblies, transcriptomes, annotations, and methylomes for the 26 inbreds that serve as the founders for the maize nested association mapping population. The number of pan-genes in these diverse genomes exceeds 103,000, with approximately a third found across all genotypes. The results demonstrate that the ancient tetraploid character of maize continues to degrade by fractionation to the present day. Excellent contiguity over repeat arrays and complete annotation of centromeres revealed additional variation in major cytological landmarks. We show that combining structural variation with single-nucleotide polymorphisms can improve the power of quantitative mapping studies. We also document variation at the level of DNA methylation and demonstrate that unmethylated regions are enriched for cis-regulatory elements that contribute to phenotypic variation.

show abstract

Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline

Liao

et al. 2019

Preprint

173

239

View full text Add to dashboard Cite

20Sequencing technology and assembly algorithms have matured to the point that high-21 quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse 22 transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for 23 each class of elements with unknown relative performance metrics. We benchmarked existing 24 programs based on a curated library of rice TEs. Using the most robust programs, we created a 25 comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a 26 condensed TE library for annotations of structurally intact and fragmented elements. EDTA is 27 open-source and freely available: https://github.com/oushujun/EDTA. 28 Keywords 29 Transposable element; Annotation; Genome; Benchmarking; Pipeline 30 31Long-read sequencing (e.g., PacBio and Oxford Nanopore) and assembly scaffolding 50 (e.g., Hi-C and BioNano) techniques have progressed rapidly within the last few years. These 51 innovations have been critical for high-quality assembly of the repetitive fraction of genomes. In 52 fact, Ou et al. [8] demonstrated that the assembly contiguity of repetitive sequences in recent 53 long-read assemblies is even better than traditional BAC-based reference genomes. With these 54 developments, inexpensive and high-quality assembly of an entire genome is now possible. 55Knowing where features (i.e., genes, TEs, etc.) exist in a genome assembly is important 56 4 information for using these assemblies for biological findings. However, unlike the relatively 57 straightforward and comprehensive pipelines established for gene annotation [9][10][11], current 58 methods for TE annotation can be piecemeal, inaccurate, and are highly specific to classes of 59 transposable elements. 60Transposable elements fall into two major classes. Class I elements, also known as 61 retrotransposons, use an RNA intermediate in their "copy and paste" mechanism of 62 transposition [12]. Class I elements can be further divided into long terminal repeat (LTR) 63 retrotransposons, as well as those that lack LTRs (non-LTRs), which include long interspersed 64 nuclear elements (LINEs), and short interspersed nuclear elements (SINEs). Structural features 65 of these elements can facilitate automated de novo annotation in a genome assembly. For 66 example, LTR elements have a 5-bp target site duplication (TSD), while non-LTRs have either 67 variable length TSDs or lack TSDs entirely, being instead associated with deletion of flanking 68 sequences upon insertion [13]. There are also standard terminal sequences associated with 69 LTR elements (i.e., 5'-TG…C/G/TA-3' for LTR-Copia and 5'-TG…CA-3' for LTR-Gypsy 70 elements), and non-LTRs often have a terminal poly-A tail at the 3' end of the element (see [14] 71 for a complete description of structural features of each superfamily). 72The second major class of TEs, Class II elements, also known as DNA transposons, use 73 a DNA intermediate in their "cut and paste" mechanism of transposition [15]. As with Class I 74...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kapeel Chougule

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza

De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes

Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline

Contact Info

Product

Resources

About