As a part of the Protein Structure Initiative (PSI; www.nigms.nih.gov/funding/psi.html) the BSGC has focused on obtaining the 3D structural information of the proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae (http://www.strgen.org), which have fewer than 500 and 700 genes, respectively. The requisite to achieve this goal involved obtaining 3D structural information for nearly all proteins, a large portion of which are hypothetical proteins, the proteins with no sequence homologies to those of known function. Now, we have 3D structural information for near complete structural complement of M. genitalium. Thus, we now have a structural genomic view of protein fold usage among these and other minimal microbes [3].
3
Metrics and Impact of BSGC structuresAt the beginning of PSI initiative, about 30% of M. genitalium "soluble" proteins had no 3-D structural fold information. By the time of the completion of the PSI-I, about 94% of the "soluble" proteins have 3-D structural fold information, thus, achieving the mission of BSGC of obtaining a near complete structural complement of a minimal organism. Several metrics were learned from the exercise:(1) About 1/2 of proteins that had no sequence similarity to the proteins in PDB turned out to have "new folds" and ~1/2 turned out to be "remote homologues" in which homology could only be identified through structural similarity to a known fold.(2) About 2/3 of the 3-D structures of "hypothetical" proteins inferred testable molecular (biochemical or biophysical) functions, and some of which have since been confirmed experimentally.(3) The overall success rate of "single-path" (low-hanging fruit) approach for clone-to-structure was <5%, and for purified protein-to-structure was ~9%.(4) The overall success rate of "multi-path" (single-path plus "salvage path") approach for clone-to-structure was >16%, and for purified protein-to-structure was ~27% 4The overall impact of BSGC structures to the functional inference is summarized below:• 66 BSGC structures belong to 51 protein sequence families.