The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.
A previous bioinformatics-based search for riboswitches yielded several candidate motifs in eubacteria. One of these motifs commonly resides in the 5' untranslated regions of genes involved in the biosynthesis of queuosine (Q), a hypermodified nucleoside occupying the anticodon wobble position of certain transfer RNAs. Here we show that this structured RNA is part of a riboswitch selective for 7-aminomethyl-7-deazaguanine (preQ(1)), an intermediate in queuosine biosynthesis. Compared with other natural metabolite-binding RNAs, the preQ(1) aptamer appears to have a simple structure, consisting of a single stem-loop and a short tail sequence that together are formed from as few as 34 nucleotides. Despite its small size, this aptamer is highly selective for its cognate ligand in vitro and has an affinity for preQ(1) in the low nanomolar range. Relatively compact RNA structures can therefore serve effectively as metabolite receptors to regulate gene expression.
The YgjD/Kae1 family (COG0533) has been on the top-10 list of universally conserved proteins of unknown function for over 5 years. It has been linked to DNA maintenance in bacteria and mitochondria and transcription regulation and telomere homeostasis in eukaryotes, but its actual function has never been found. Based on a comparative genomic and structural analysis, we predicted this family was involved in the biosynthesis of N 6 -threonylcarbamoyl adenosine, a universal modification found at position 37 of tRNAs decoding ANN codons. This was confirmed as a yeast mutant lacking Kae1 is devoid of t 6 A. t 6 A À strains were also used to reveal that t 6 A has a critical role in initiation codon restriction to AUG and in restricting frameshifting at tandem ANN codons. We also showed that YaeZ, a YgjD paralog, is required for YgjD function in vivo in bacteria. This work lays the foundation for understanding the pleiotropic role of this universal protein family.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.