Transposable elements (TEs) are mobile, repetitive sequences that make up significant fractions of metazoan genomes. Despite their near ubiquity and importance in genome and chromosome biology, most efforts to annotate TEs in genome sequences rely on the results of a single computational program, RepeatMasker. In contrast, recent advances in gene annotation indicate that high-quality gene models can be produced from combining multiple independent sources of computational evidence. To elevate the quality of TE annotations to a level comparable to that of gene models, we have developed a combined evidence-model TE annotation pipeline, analogous to systems used for gene annotation, by integrating results from multiple homology-based and de novo TE identification methods. As proof of principle, we have annotated “TE models” in Drosophila melanogaster Release 4 genomic sequences using the combined computational evidence derived from RepeatMasker, BLASTER, TBLASTX, all-by-all BLASTN, RECON, TE-HMM and the previous Release 3.1 annotation. Our system is designed for use with the Apollo genome annotation tool, allowing automatic results to be curated manually to produce reliable annotations. The euchromatic TE fraction of D. melanogaster is now estimated at 5.3% (cf. 3.86% in Release 3.1), and we found a substantially higher number of TEs (n = 6,013) than previously identified (n = 1,572). Most of the new TEs derive from small fragments of a few hundred nucleotides long and highly abundant families not previously annotated (e.g., INE-1). We also estimated that 518 TE copies (8.6%) are inserted into at least one other TE, forming a nest of elements. The pipeline allows rapid and thorough annotation of even the most complex TE models, including highly deleted and/or nested elements such as those often found in heterochromatic sequences. Our pipeline can be easily adapted to other genome sequences, such as those of the D. melanogaster heterochromatin or other species in the genus Drosophila.
The techniques that are usually used to detect transposable elements (TEs) in nucleic acid sequences rely on sequence similarity with previously characterized elements. However, these methods are likely to miss many elements in various organisms. We tested two strategies for the detection of unknown elements. The first, which we call "TBLASTX strategy," searches for TE sequences by comparing the six-frame translations of the nucleic acid sequences of known TEs with the genomic sequence of interest. The second, "repeat-based strategy," searches genomic sequences for long repeats and clusters them in groups of similar sequences. TE copies from a given family are expected to cluster together. We tested the Drosophila melanogaster genomic sequence and the recently sequenced Anopheles gambiae genome in which most TEs remain unknown. We showed that the "TBLASTX strategy" is very efficient as it detected at least 332 new TE families in D. melanogaster and 400 in A. gambiae. This was unexpected in Drosophila as TEs of this organism have been extensively studied. The "repeat-based strategy" appeared to be very inefficient because of two problems: (i) TE copies are heavily deleted and few copies share homologous regions, and (ii) segmental duplications are frequent and it is not easy to distinguish them from TE copies.
Genetic and molecular investigations were carried out with Eurasian Drosophila melanogaster populations on the P-M system of hybrid dysgenesis. In 27 strains sampled from France to Middle Asia, a clear gradient exists between Western Europe, in which most modern strains are of the Q type, and eastern areas, in which M-cytotype strains predominate. Molecular analysis on individual flies was performed with two complementary probes of the cloned 2.9-kilobase P element. The results provide evidence for a gradually decreasing frequency of P elements from west to east, but the presence of P-homologous sequences has been ascertained in all of the wild M-cytotype populations analyzed. Moreover, some active P elements with GD sterility potential were revealed in the majority of M-cytotype populations when tested with a highly sensitive reference line. The gradual change in distribution of the polymorphic P family in Eurasia is discussed in relation to the structure of the elements together with the theories of P-M evolution and is interpreted as the present invasion of Eurasian populations by these elements.About 10% of the genome ofDrosophila melanogaster exists as dispersed moderately repetitive sequences belonging to different families (1). The P family is composed of mobile dispersed genetic elements implicated in the P-M system of hybrid dysgenesis. This phenomenon, which is manifested in certain interstrain hybrids, results in a number of correlated aberrant genetic traits-e.g., high frequencies of gonadal sterility (GD sterility), mutation, and male recombination (2). Three types of individuals, P, Q, and M, have been described on the basis of their cross-effect properties. Hybrids between P males and M females show dysgenic traits that are reduced or absent in the reciprocal hybrids. Q individuals do not exhibit GD sterility in any cross-combinations but produce mutations and male recombinations in crosses with M females (3, 4).In the P-M system, hybrid dysgenesis results from interactions between chromosomally linked factors (P factors) and a particular type of cellular state referred to as the M cytotype (5). The P factors are active genetic elements of the P family, whose members are heterogeneous in size [0.5-2.9 kilobases (kb)], but which share substantial sequence homology (6-8). All P and Q strains thus far examined bear 30-50 copies of the P family (7). Q individuals are thought to carry a subset of the P-element family that apparently lacks sterility potential while retaining mutator activity and other Pelement functions (7,9,10). Conversely, all long-established laboratory M strains that have been examined completely lack homology with the P-element family (7). Some strains showing the M cytotype but with some homology to P sequences have also been found in laboratory collections (7).In this paper, such strains will be called M', the term M strain being reserved for strains of the M cytotype with no P homology at all.The M-cellular state component of the P-M interaction may be considered as a "s...
The recently described THAP domain motif characterizes a DNA-binding domain (DBD) that is widely conserved in human and in animals. It presents a similarity with the DBD of the P element transposase of D. melanogaster. We show here that the P Drosophila neogenes derived from P-transposable elements conserve the THAP domain. Moreover, secondary rearrangements by exon shuffling indicate the recurrent recruitment of this domain by the host genome. As P sequences and THAP genes are found together in many animal genomes, we discuss the possibility that the THAP proteins have acquired their domain as a result of recurrent molecular domestication of P-transposable elements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.