BackgroundThe ability to perform de novo biosynthesis of purines is present in organisms in all three domains of life, reflecting the essentiality of these molecules to life. Although the pathway is quite similar in eukaryotes and bacteria, the archaeal pathway is more variable. A careful manual curation of genes in this pathway demonstrates the value of manual curation in archaea, even in pathways that have been well-studied in other domains.ResultsWe searched the Integrated Microbial Genome system (IMG) for the 17 distinct genes involved in the 11 steps of de novo purine biosynthesis in 65 sequenced archaea, finding 738 predicted proteins with sequence similarity to known purine biosynthesis enzymes. Each sequence was manually inspected for the presence of active site residues and other residues known or suspected to be required for function.Many apparently purine-biosynthesizing archaea lack evidence for a single enzyme, either glycinamide ribonucleotide formyltransferase or inosine monophosphate cyclohydrolase, suggesting that there are at least two more gene variants in the purine biosynthetic pathway to discover. Variations in domain arrangement of formylglycinamidine ribonucleotide synthetase and substantial problems in aminoimidazole carboxamide ribonucleotide formyltransferase and inosine monophosphate cyclohydrolase assignments were also identified.Manual curation revealed some overly specific annotations in the IMG gene product name, with predicted proteins without essential active site residues assigned product names implying enzymatic activity (21 proteins, 2.8% of proteins inspected) or Enzyme Commission (E. C.) numbers (57 proteins, 7.7%). There were also 57 proteins (7.7%) assigned overly generic names and 78 proteins (10.6%) without E.C. numbers as part of the assigned name when a specific enzyme name and E. C. number were well-justified.ConclusionsThe patchy distribution of purine biosynthetic genes in archaea is consistent with a pathway that has been shaped by horizontal gene transfer, duplication, and gene loss. Our results indicate that manual curation can improve upon automated annotation for a small number of automatically-annotated proteins and can reveal a need to identify further pathway components even in well-studied pathways.ReviewersThis article was reviewed by Dr. Céline Brochier-Armanet, Dr Kira S Makarova (nominated by Dr. Eugene Koonin), and Dr. Michael Galperin.
The solution structures of two computationally designed core variants of the 1 domain of streptococcal protein G (G1) were solved by 1 H NMR methods to assess the robustness of amino acid sequence selection by the ORBIT protein design package under changes in protein backbone specification. One variant has mutations at three of 10 core positions and corresponds to minimal perturbations of the native G1 backbone. The other, with mutations at six of 10 positions, was calculated for a backbone in which the separation between G1's ␣-helix and -sheet was increased by 15% relative to native G1. Exchange broadening of some resonances and the complete absence of others in spectra of the sixfold mutant bespeak conformational heterogeneity in this protein. The NMR data were sufficiently abundant, however, to generate structures of similar, moderately high quality for both variants. Both proteins adopt backbone structures similar to their target folds. Moreover, the sequence selection algorithm successfully predicted all core 1 angles in both variants, five of six 2 angles in the threefold mutant and four of seven 2 angles in the sixfold mutant. We conclude that ORBIT calculates sequences that fold specifically to a geometry close to the template, even when the template is moderately perturbed relative to a naturally occurring structure. There are apparently limits to the size of acceptable perturbations: In this study, the larger perturbation led to undesired dynamic behavior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.