The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.
In Here we show that an open reading frame at 9 min on the chromosomal map of E. coli encodes an enzyme (deoxyxylulose-5-phosphate synthase, DXP synthase) that catalyzes a thiamin diphosphate-dependent acyloin condensation reaction between C atoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield DXP. We have cloned and overexpressed the gene (dxs), and the enzyme was purified 17-fold to a specific activity of 0.85 unit͞mg of protein. The reaction catalyzed by DXP synthase yielded exclusively DXP, which was characterized by 1 H and 31 P NMR spectroscopy. Although DXP synthase of E. coli shows sequence similarity to both transketolases and the E1 subunit of pyruvate dehydrogenase, it is a member of a distinct protein family, and putative DXP synthase sequences appear to be widespread in bacteria and plant chloroplasts.
The structural characterization of proteins expressed from the genome is a major problem in proteomics. The solution to this problem requires the separation of the protein of interest from a complex mixture, the identification of its DNA-predicted sequence, and the characterization of sequencing errors and posttranslational modifications. For this, the "top down" mass spectrometry (MS) approach, extended by the greatly increased protein fragmentation from electron capture dissociation (ECD), has been applied to characterize proteins involved in the biosynthesis of thiamin, Coenzyme A, and the hydroxylation of proline residues in proteins. With Fourier transform (FT) MS, electrospray ionization (ESI) of a complex mixture from an E. coli cell extract gave 102 accurate molecular weight values (2-30 kDa), but none corresponding to the predicted masses of the four desired enzymes for thiamin biosynthesis (GoxB, ThiS, ThiG, and ThiF). MS/MS of one ion species (representing approximately 1% of the mixture) identified it with the DNA-predicted sequence of ThiS, although the predicted and measured molecular weights were different. Further purification yielded a 2-component mixture whose ECD spectrum characterized both proteins simultaneously as ThiS and ThiG, showing an additional N-terminal Met on the 8 kDa ThiS and removal of an N-terminal Met and Ser from the 27 kDa ThiG. For a second system, the molecular weight of the 45 kDa phosphopantothenoylcysteine synthetase/decarboxylase (CoaBC), an enzyme involved in Coenzyme A biosynthesis, was 131 Da lower than that of the DNA prediction; the ECD spectrum showed that this is due to the removal of the N-terminal Met. For a third system, viral prolyl 4-hydroxylase (26 kDa), ECD showed that multiple molecular ions (+98, +178, etc.) are due to phosphate noncovalent adducts, and MS/MS pinpointed the overall mass discrepancy of 135 Da to removal of the initiation Met (131 Da) and to formation of disulfide bonds (2 x 2 Da) at C32-C49 and C143-C147, although 10 S-S positions were possible. In contrast, "bottom up" proteolysis characterization of the CoaBC and the P4H proteins was relatively unsuccessful. The addition of ECD substantially increases the capabilities of top down FTMS for the detailed structural characterization of large proteins.
Thiamin is synthesized by most prokaryotes and by eukaryotes such as yeast and plants. In all cases, the thiazole and pyrimidine moieties are synthesized in separate branches of the pathway and coupled to form thiamin phosphate. A final phosphorylation gives thiamin pyrophosphate, the active form of the cofactor. Over the past decade or so, biochemical and structural studies have elucidated most of the details of the thiamin biosynthetic pathway in bacteria. Formation of the thiazole requires six gene products, and formation of the pyrimidine requires two. In contrast, details of the thiamin biosynthetic pathway in yeast are only just beginning to emerge. Only one gene product is required for the biosynthesis of the thiazole and one for the biosynthesis of the pyrimidine. Thiamin can also be transported into the cell and can be salvaged through several routes. In addition, two thiamin degrading enzymes have been characterized, one of which is linked to a novel salvage pathway.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.