Ribonucleotide reductases (RNRs) are ancient enzymes that catalyze the reduction of ribonucleotides to deoxyribonucleotides. They are required for virtually all cellular life and are prominent within viral genomes. RNRs share a common ancestor and must generate a protein radical for direct ribonucleotide reduction. The mechanisms by which RNRs produce radicals are diverse and divide RNRs into three major classes and several subclasses. The diversity of radical generation methods means that cellular organisms and viruses typically contain the RNR best-suited to the environmental conditions surrounding DNA replication. However, such diversity has also fostered high rates of RNR misannotation within subject sequence databases. These misannotations have resulted in incorrect translative presumptions of RNR biochemistry and have diminished the utility of this marker gene for ecological studies of viruses. We discovered a misannotation of the RNR gene within the Prochlorococcus phage P-SSP7 genome, which caused a chain of misannotations within commonly observed RNR genes from marine virioplankton communities. These RNRs are found in marine cyanopodo- and cyanosiphoviruses and are currently misannotated as Class II RNRs, which are O2-independent and require cofactor B12. In fact, these cyanoviral RNRs are Class I enzymes that are O2-dependent and may require a di-metal cofactor made of Fe, Mn, or a combination of the two metals. The discovery of an overlooked Class I β subunit in the P-SSP7 genome, together with phylogenetic analysis of the α and β subunits confirms that the RNR from P-SSP7 is a Class I RNR. Phylogenetic and conserved residue analyses also suggest that the P-SSP7 RNR may constitute a novel Class I subclass. The reannotation of the RNR clade represented by P-SSP7 means that most lytic cyanophage contain Class I RNRs, while their hosts, B12-producing Synechococcus and Prochlorococcus, contain Class II RNRs. By using a Class I RNR, cyanophage avoid a dependence on host-produced B12, a more effective strategy for a lytic virus. The discovery of a novel RNR β subunit within cyanopodoviruses also implies that some unknown viral genes may be familiar cellular genes that are too divergent for homology-based annotation methods to identify.
Background: Increasingly, researchers use protein-coding genes from targeted PCR amplification or direct metagenomic sequencing in community and population ecology. Analysis of protein-coding genes presents different challenges from those encountered in traditional SSU rRNA studies. Most protein-coding sequences are annotated based on homology to other computationally-annotated sequences, which can lead to inaccurate annotations. Therefore, the results of sensitive homology searches must be validated to remove false-positives and assess functionality. Multiple lines of in silico evidence can be gathered by examining conserved domains and residues identified through biochemical investigations. However, manually validating sequences in this way can be time consuming and error prone, especially in large environmental studies. Results: An automated pipeline for protein active site validation (PASV) was developed to improve validation and partitioning accuracy for protein-coding sequences, combining multiple sequence alignment with expert domain knowledge. PASV was tested using commonly misannotated proteins: ribonucleotide reductase (RNR), alternative oxidase (AOX), and plastid terminal oxidase (PTOX). PASV partitioned 9,906 putative Class I alpha and Class II RNR sequences from bycatch in a global viral metagenomic investigation with >99% true positive and true negative rates. PASV predicted the class of 2,579 RNR sequences in >98% agreement with manual annotations. PASV correctly partitioned all 336 tested AOX and PTOX sequences. Conclusions: PASV provides an automated and accurate way to address post-homology search validation and partitioning of protein-coding marker genes. Source code is released under the MIT license and is found with documentation and usage examples on GitHub at https://github.com/mooreryan/pasv.
19711 15 (Tel): (302) 831-3235 16 (Fax): (302) 831-4841 17 ABSTRACT 24The throughput of DNA sequencing continues to increase, allowing researchers 25 to analyze genomes of interest at greater depths. An unintended consequence of this 26 data deluge is the increased cost of analyzing these datasets. As a result, genome and 27 metagenome annotation pipelines are left with a few options: (i) search against smaller 28 reference databases, (ii) use faster, but less sensitive, algorithms to assess sequence 29 similarities, or (iii) invest in computing hardware specifically designed to improve BLAST 30 searches such as GPGPU systems and/or large CPU-rich clusters. 31We present a pipeline that improves the speed of amino acid sequence 32 homology searches with a minimal decrease in sensitivity and specificity by searching 33 against hierarchical clusters. Briefly, the pipeline requires two homology searches: the 34 first search is against a clustered version of the database and the second is against 35 sequences belonging to clusters with a hit from the first search. We tested this method 36 using two assembled viral metagenomes and three databases 37 Metagenomes Online, and UniRef100). Hierarchical cluster homology searching proved 38 to be 12-times faster than BLASTp and produced alignments that were nearly identical 39to BLASTp (precision=0.99; recall=0.97). This approach is ideal when searching large 40 collections of sequences against large databases. 41 42
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.