Crystallization has proven to be the most significant bottleneck to high-throughput protein structure determination using diffraction methods. We have used the large-scale, systematically generated experimental results of the Northeast Structural Genomics Consortium to characterize the biophysical properties that control protein crystallization. Datamining of crystallization results combined with explicit folding studies lead to the conclusion that crystallization propensity is controlled primarily by the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. These analyses identify specific sequence features correlating with crystallization propensity that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the bulk amino acid sequence properties of human versus eubacterial proteins that reflect likely differences in their biophysical properties including crystallization propensity. Finally, our thermodynamic measurements enable critical evaluation of previous claims regarding correlations between protein stability and bulk sequence properties, which generally are not supported by our dataset.
NIH Public Access Author ManuscriptNat Biotechnol. Author manuscript; available in PMC 2010 January 1.
Published in final edited form as:Nat Biotechnol. 2009 January ; 27(1): 51-57. doi:10.1038/nbt.1514.
NIH-PA Author ManuscriptNIH-PA Author Manuscript
NIH-PA Author ManuscriptThe ability to determine the atomic structures of macromolecules represents a great achievement in molecular biology because of the unparalleled value of this information in understanding the fundamental chemistry of life [1][2][3][4][5] . While nuclear magnetic resonance represents an invaluable source of structural information, especially for small proteins, most macromolecular structures are determined using x-ray crystallography. Capitalizing on the recent proliferation of genomic sequence data, "structural genomics" consortia have been organized worldwide to develop methods and infrastructure for high-throughput protein structure determination. These groups have contributed to improvements in expression and structure determination methods 6 , and the four largest U.S. consortia accounted for 45% of all novel structures deposited in the Protein Data Bank (PDB) in 2007 7 . While these efforts contribute to the impressive progress of the structural biology community in characterizing the full repertoire of protein structures, the rate of growth in sequence information nonetheless far out-paces that of structural information. Given the ongoing acceleration of whole-genome sequencing, the gap between the two will continue to expand without a breakthrough in macromolecular structure determination methods.The systematic efforts of structural genomics projects show that crystallization is the major bottleneck to protein structure determinati...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.