Computational protein design facilitates the continued development of methods for the design of biomolecular structure, sequence and function. Recent applications include the design of novel protein sequences and structures, proteins incorporating nonbiological components, protein assemblies, soluble variants of membrane proteins, and proteins that modulate membrane function.
KeywordsComputational protein design; Nanostructures; Structural biology; Self-assembly; Protein library; Protein symmetry; Nanoparticle; Optoelectronic; Cofactor; Porphyrin; Membrane protein
OverviewNatural proteins possess a wide variety of selective functionalities, including folding, selfassembly, catalysis and molecular recognition. Since the folded state of a protein is in most cases dictated by the sequence of amino acids, structure can potentially be specified through the careful selection of sequence. Nature leverages the physicochemical properties of the amino acids to arrive at highly functional sequences that spontaneously fold, where structural and functional properties are tuned during the course of evolution. Well-structured proteins may also be realized via the careful design of sequences. This can be nontrivial. Proteins are large, comprising tens to thousands of amino acid monomers, and possess many backbone and side-chain degrees of freedom. As a result the configurational state space for proteins is large, even if the backbone tertiary structure is predetermined. The stabilizing interactions that guide the protein to its native state are largely noncovalent, and quantitative estimates of stability with respect to unfolding can be difficult to infer. In addition, the large number of possible sequences leads to a further combinatorial complexity in protein design: for a modestly sized protein of only 100 amino acids, more than 10 130 sequences are possible if only the 20 naturally occurring amino acids are used. Nonetheless, theoretical methods have made accessible the design and study of new proteins and protein-based assemblies. Most such methods begin with a target structure, which can be a naturally occurring computationally modeled. A physically motivated objective function that quantifies consistency of the sequences with the target structure is optimized so as to identify individual sequences or the properties of the ensemble of sequences consistent with the target structure and any desired functional properties. Algorithmic techniques for identifying low-energy sequences include dead-end elimination, Monte Carlo simulated annealing, genetic algorithms, and optimization theory approaches [1][2][3][4]. In addition, saven@sas.upenn.edu.
NIH Public Access
Protein re-engineeringThe activation domain of human procarboxypeptidase A2 has been redesigned, resulting in a variant with 68% of the wild-type sequence mutated. The redesigned protein is over 10 kcal/ mol more stable than the wild-type protein, and the high-resolution crystal structure and solution NMR structures are effectively superimposable with the com...