Transposable elements (TE) are one of the major driving forces of genome evolution, raising the question of the long-term dynamics underlying their evolutionary success. Long-term TE evolution can readily be reconstructed in eukaryotes, thanks to many degraded copies constituting genomic fossil records of past TE proliferations. By contrast, bacterial genomes usually experience high sequence turnover and short TE retention times, thereby obscuring ancient TE evolutionary patterns. We found that Wolbachia bacterial genomes contain 52–171 insertion sequence (IS) TEs. IS account for 11% of Wolbachia wRi, which is one of the highest IS genomic coverage reported in prokaryotes to date. We show that many IS groups are currently expanding in various Wolbachia genomes and that IS horizontal transfers are frequent among strains, which can explain the apparent synchronicity of these IS proliferations. Remarkably, >70% of Wolbachia IS are nonfunctional. They constitute an unusual bacterial IS genomic fossil record providing direct empirical evidence for a long-term IS evolutionary dynamics following successive periods of intense transpositional activity. Our results show that comprehensive IS annotations have the potential to provide new insights into prokaryote TE evolution and, more generally, prokaryote genome evolution. Indeed, the identification of an important IS genomic fossil record in Wolbachia demonstrates that IS elements are not always of recent origin, contrary to the conventional view of TE evolution in prokaryote genomes. Our results also raise the question whether the abundance of IS fossils is specific to Wolbachia or it may be a general, albeit overlooked, feature of prokaryote genomes.
BackgroundNext-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs.ResultsIn this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information.ConclusionsOur approach yields coding transcriptome assemblies that are more likely to be closer to biological reality than any of the three individual assembly packages we investigated. This approach (freely available as a simple perl script) will be of use to researchers working with species for which there is little or no reference data against which the assembly of a transcriptome can be performed.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1406-x) contains supplementary material, which is available to authorized users.
BackgroundHorizontal transfer of transposable elements (HTT) is increasingly appreciated as an important source of genome and species evolution in eukaryotes. However, our understanding of HTT dynamics is still poor in eukaryotes because the diversity of species for which whole genome sequences are available is biased and does not reflect the global eukaryote diversity.ResultsIn this study we characterized two Mariner transposable elements (TEs) in the genome of several terrestrial crustacean isopods, a group of animals particularly underrepresented in genome databases. The two elements have a patchy distribution in the arthropod tree and they are highly similar (>93% over the entire length of the element) to insect TEs (Diptera and Hymenoptera), some of which were previously described in Ceratitis rosa (Crmar2) and Drosophila biarmipes (Mariner-5_Dbi). In addition, phylogenetic analyses and comparisons of TE versus orthologous gene distances at various phylogenetic levels revealed that the taxonomic distribution of the two elements is incompatible with vertical inheritance.ConclusionsWe conclude that the two Mariner TEs each underwent at least three HTT events. Both elements were transferred once between isopod crustaceans and insects and at least once between isopod crustacean species. Crmar2 was also transferred between tephritid and drosophilid flies and Mariner-5 underwent HT between hymenopterans and dipterans. We demonstrate that these various HTTs took place recently (most likely within the last 3 million years), and propose iridoviruses and/or Wolbachia endosymbionts as potential vectors of these transfers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.