Efficient Algorithms and Software for Detection of Full-Length LTR Retrotransposons

Kalyanaraman, Anantharaman; Aluru, Srinivas

doi:10.1142/s021972000600203x

Cited by 34 publications

(25 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This provides correct identification of fragmented TEs in nested repeat clusters, but reconstruction of whole TEs and evolutionary timeline of insertions is not possible. LTR retrotransposon detection software, such as LTR_struct (McCarthy and McDonald, 2003;Kalyanaraman and Aluru, 2005), groups LTR pairs based on sequence alignment identity. With LTR pair locations, one can infer a general retrotransposon insertion order; however, nested repeats are not specifically addressed and an LTR broken from subsequent insertions will not be identified.…”

mentioning

confidence: 99%

TEnest: Automated Chronological Annotation and Visualization of Nested Plant Transposable Elements

Kronmiller

Wise

2007

Plant Physiology

View full text Add to dashboard Cite

Organisms with a high density of transposable elements (TEs) exhibit nesting, with subsequent repeats found inside previously inserted elements. Nesting splits the sequence structure of TEs and makes annotation of repetitive areas challenging. We present TEnest, a repeat identification and display tool made specifically for highly repetitive genomes. TEnest identifies repetitive sequences and reconstructs separated sections to provide full-length repeats and, for long-terminal repeat (LTR) retrotransposons, calculates age since insertion based on LTR divergence. TEnest provides a chronological insertion display to give an accurate visual representation of TE integration history showing timeline, location, and families of each TE identified, thus creating a framework from which evolutionary comparisons can be made among various regions of the genome. A database of repeats has been developed for maize (Zea mays), rice (Oryza sativa), wheat (Triticum aestivum), and barley (Hordeum vulgare) to illustrate the potential of TEnest software. All currently finished maize bacterial artificial chromosomes totaling 29.3 Mb were analyzed with TEnest to provide a characterization of the repeat insertions. Sixty-seven percent of the maize genome was found to be made up of TEs; of these, 95% are LTR retrotransposons. The rate of solo LTR formation is shown to be dissimilar across retrotransposon families. Phylogenetic analysis of TE families reveals specific events of extreme TE proliferation, which may explain the high quantities of certain TE families found throughout the maize genome. The TEnest software package is available for use on PlantGDB under the tools section

show abstract

mentioning

confidence: 99%

TEnest: Automated Chronological Annotation and Visualization of Nested Plant Transposable Elements

Kronmiller

Wise

2007

Plant Physiology

View full text Add to dashboard Cite

show abstract

“…As evaluated in Ref. [4], LTR harvest and LTR-FINDER+LTR harvest generated a large number of false-positive LTRs which required further removal through annotations. Most false positives resulted from duplicated genes and, as such, possessed no significant internal protein domains.…”

Section: Resultsmentioning

confidence: 99%

“…Several programs have been released for identifying full-length or intact LTRs, such as LTR_STRUCT [3], LTR_PAR [4], FIND_LTR [5], LTR_FINDER [6] and LTR harvest [7]. These tools take into account several major characteristics of LTRs such as the size range of intact LTRs, the distances between two LTRs of intact elements, the presence of target site duplications (TSDs) at each terminal region, the presence of critical sites for reversing transcribing elements for transposition such as the primer binding site (PBS) and the poly purine tract (PPT), and the identity percentage between two LTRs.…”

Section: Introductionmentioning

confidence: 99%

LTR Annotator: Automated Identification and Annotation of LTR Retrotransposons in Plant Genomes

You¹,

Cloutier²,

Shan³

et al. 2015

IJBBB

View full text Add to dashboard Cite

Long Terminal Repeat transposable element (LTR) is a major type of mobile elements ubiquitous in eukaryotic genomes. They account for a major proportion of many plant genomes and have a prominent impact on the evolution of genome size, structure and function. Although some bioinformatics tools for de novo LTR identification from genome sequences have been developed, an automated and standardized software tool for both LTR identification and annotation would be valuable and essential for comparative analysis of sequenced plant genomes. We present here a Java-based pipeline tool, called LTR Annotator, for automatically and consistently performing genome-wide de novo identification and annotation of LTRs of plant genome sequences. The pipeline first identifies LTRs using both LTR_FINDER and LTR harvest, then performs intensive annotations, and finally sweeps out potentially false-positive LTRs. The pipeline was evaluated using the well curated Arabidopsis genome. High sensitivity (>0.9) was obtained by using LTR harvest or LTR harvest+LTR_FINDER. Ten potentially new intact LTRs were detected. This pipeline provides a comprehensive tool to perform comparative analysis of LTRs for plant genomes, delivering annotated genomic resources for epigenetic and other studies. LTR Annotator is free and available upon request.

show abstract

“…This is achieved by deploying a strategy of first identifying pairs of exact (maximal, to be precise) matching substrings as "seeds" and extending the seeds outwards through sequence alignment 12 . The rationale is that a substantially long (M inExactM atch) exact match is a necessary but not sufficient indicator for a satisfactory alignment (M inSimilarity) -thus, generating pairs of loci with long exact matching pairs provides a good filter to predict potential aligning regions.…”

Section: Phase 1: Candidate Pattern Identificationmentioning

confidence: 99%

“…Substantial research over the last decade has led to the development of several excellent repeat identification methods and software tools 2,9,12,14,18,20,24 . While these methods differ from one another in their underlying algorithms and approaches, most of them share the following set of characteristics in their general approach towards repeat identification: (i) detection based on sequence similarity, (ii) targeting specific types of repeats, and (iii) assuming that the set of structural attributes that characterize each of their target repeat classes is known a priori to the user so that they can be provided as part of the input.…”

Section: Introductionmentioning

confidence: 99%

An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data

Davis

Kalyanaraman

Cook

2008

2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology

View full text Add to dashboard Cite

The unprecedented rate at which genomic data is accumulated underscores the need to develop highly efficient and powerful analytical capabilities. Traditionally, most of the effort post-sequencing has been focused on the identification and annotation of genes and their associated sequences such as promoters and regulatory elements. However, a major part of the vastness outside the gene-space is still left unexplored because of a lack of appropriate computational tools. Here, we propose a new approach for exploring and describing a genome without biasing the search process towards already known structural entities. Our primary objective is to discover novel conserved patterns that would typically fall off the scope of the current suite of repeat finding tools because of irregularities in their structure. The output is a hierarchy of patterns with arbitrary structural characteristics. A hierarchical representation captures the genomic sequence content at an abstract level and offers novel ways to examine the information contained in them. Our approach is an information theoretic search process which uses pattern matching techniques for processing the sequence data. Preliminary evaluation on the Drosophila genome has resulted in the finding of a number of irregular patterns, including a histone gene cluster. Discovering new patterns is an important problem in both whole-and comparative genomic application domains. It is our intent to use this research as a launch pad towards developing a comprehensive information-theoretic framework for conducting pattern and knowledge discovery on genomic data.

show abstract

Efficient Algorithms and Software for Detection of Full-Length LTR Retrotransposons

Cited by 34 publications

References 35 publications

TEnest: Automated Chronological Annotation and Visualization of Nested Plant Transposable Elements

TEnest: Automated Chronological Annotation and Visualization of Nested Plant Transposable Elements

LTR Annotator: Automated Identification and Annotation of LTR Retrotransposons in Plant Genomes

An information theoretic approach for the discovery of irregular and repetitive patterns in genomic data

Contact Info

Product

Resources

About