Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the “head–tail” gene cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not easily distinguished from their viral homologs without additional analyses. Here, we report a “support vector machine” classifier that quickly and accurately distinguishes RcGTA-like genes from their viral homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to detect other elements of viral ancestry. Using the classifier trained on a manually curated set of homologous viruses and GTAs, we detected RcGTA-like “head–tail” gene clusters in 57.5% of the 1,423 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial genomes the RcGTA-like elements remain unrecognized.
Gene transfer agents (GTAs) are virus-like particles encoded and produced by many bacteria and archaea. Unlike viruses, GTAs package fragments of the host genome instead of the genes that encode the components of the GTA itself. As a result of this non-specific DNA packaging, GTAs can transfer genes within bacterial and archaeal communities. GTAs clearly evolved from viruses and are thought to have been maintained in prokaryotic genomes due to the advantages associated with their DNA transfer capacity. The most-studied GTA is produced by the alphaproteobacterium Rhodobacter capsulatus (RcGTA), which packages random portions of the host genome at a lower DNA density than usually observed in tailed bacterial viruses. How the DNA packaging properties of RcGTA evolved from those of the ancestral virus remains unknown. To address this question, we reconstructed the evolutionary history of the large subunit of the terminase (TerL), a highly conserved enzyme used by viruses and GTAs to package DNA. We found that RcGTA-like TerLs grouped within viruses that employ the headful packaging strategy. Because distinct mechanisms of viral DNA packaging correspond to differences in the TerL amino acid sequence, our finding suggests that RcGTA evolved from a headful packaging virus. Headful packaging is the least sequence-specific mode of DNA packaging, which would facilitate the switch from packaging of the viral genome to packaging random pieces of the host genome during GTA evolution.
21Many of the sequenced bacterial and archaeal genomes encode regions of viral provenance. Yet, not all of 22 these regions encode bona fide viruses. Gene transfer agents (GTAs) are thought to be former viruses that 23 are now maintained in genomes of some bacteria and archaea and are hypothesized to enable exchange of 24 DNA within bacterial populations. In Alphaproteobacteria, genes homologous to the 'head-tail' gene 25 cluster that encodes structural components of the Rhodobacter capsulatus GTA (RcGTA) are found in 26 many taxa, even if they are only distantly related to Rhodobacter capsulatus. Yet, in most genomes 27 available in GenBank RcGTA-like genes have annotations of typical viral proteins, and therefore are not 28 easily distinguished from their viral homologs without additional analyses. Here, we report a 'support 29 vector machine' classifier that quickly and accurately distinguishes RcGTA-like genes from their viral 30 homologs by capturing the differences in the amino acid composition of the encoded proteins. Our open-31 source classifier is implemented in Python and can be used to scan homologs of the RcGTA genes in 32 newly sequenced genomes. The classifier can also be trained to identify other types of GTAs, or even to 33 detect other elements of viral ancestry. Using the classifier trained on a manually curated set of 34 homologous viruses and GTAs, we detected RcGTA-like 'head-tail' gene clusters in 57.5% of the 1,423 35 examined alphaproteobacterial genomes. We also demonstrated that more than half of the in silico 36prophage predictions are instead likely to be GTAs, suggesting that in many alphaproteobacterial 37 genomes the RcGTA-like elements remain unrecognized. 38 39 Keywords 40 Virus exaptation, GTA, Rhodobacter capsulatus, support vector machine, binary classification, carbon 41 depletion 42 43 7 value < 0.001; query and subject overlap by at least 60% of their length) and PSI-BLASTP searches (E-132 value < 0.001; query and subject overlap by at least 40% of their length; maximum of six iterations) of 133 the viral RefSeq database release 90 (last accessed in November 2018; accession numbers of the viral 134 entries are provided in Supplementary Table S2). BLASTP and PSI-BLAST executables were from the 135 BLAST v. 2.6.0+ package (Altschul et al. 1997) . The obtained homologs are listed in Supplementary 136
Gene transfer agents (GTAs) are virus-like elements integrated into bacterial genomes, particularly, those of Alphaproteobacteria. The GTAs can be induced under nutritional stress, incorporate random fragments of bacterial DNA into mini-phage particles, lyse the host cells and infect neighboring bacteria, thus enhancing horizontal gene transfer. We show that the GTA genes evolve under pronounced positive selection for the reduction of the energy cost of protein production as shown by comparison of the amino acid compositions with both homologous viral genes and host genes. The energy saving in GTA genes is comparable to or even more pronounced than that in the genes encoding the most abundant, essential bacterial proteins. In cases when viruses acquire genes from GTAs, the bias in amino acid composition disappears in the course of evolution, showing that reduction of the energy cost of protein is an important factor of evolution of GTAs but not bacterial viruses. These findings strongly suggest that GTAs are bacterial adaptations rather than selfish, virus-like elements. Because GTA production kills the host cell and does not propagate the GTA genome, it appears likely that the GTAs are retained in the course of evolution via kin or group selection. Therefore, we hypothesize that GTA facilitate the survival of bacterial populations under energy-limiting conditions through the spread of metabolic and transport capabilities via horizontal gene transfer and increase of nutrient availability resulting from the altruistic suicide of GTA-producing cells. ImportanceKin and group selection remain controversial topics in evolutionary biology. We argue that these types of selection are likely to operate in bacterial populations by showing that bacterial Gene Transfer Agents (GTAs), but not related viruses, evolve under positive selection for the reduction of the energy cost of a GTA particle production. We hypothesize that GTAs are dedicated devices for the survival of bacteria under the conditions of nutrient limitation. The benefits conferred by GTAs under nutritional stress appear to include horizontal dissemination of genes that could provide bacteria with enhanced capabilities for nutrient utilization and the increase of nutrient availability through the lysis of GTA-producing bacteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.