Determination of the tendencies of amino acids to form alpha-helical and beta-sheet structures has been important in clarifying stabilizing interactions, protein design, and the protein folding problem. In this study, we have determined for the first time a complete scale of amino acid propensities for another important protein motif: the collagen triple-helix conformation with its Gly-X-Y repeating sequence. Guest triplets of the form Gly-X-Hyp and Gly-Pro-Y are used to quantitate the conformational propensities of all 20 amino acids for the X and Y positions in the context of a (Gly-Pro-Hyp)(8) host peptide. The rankings for both the X and Y positions show the highly stabilizing nature of imino acids and the destabilizing effects of Gly and aromatic residues. Many residues show differing propensities in the X versus Y position, related to the nonequivalence of these positions in terms of interchain interactions and solvent exposure. The propensity of amino acids to adopt a polyproline II-like conformation plays a role in their triple-helix rankings, as shown by a moderate correlation of triple-helix propensity with frequency of occurrence in polyproline II-like regions. The high propensity of ionizable residues in the X position suggests the importance of interchain hydrogen bonding directly or through water to backbone carbonyls or hydroxyprolines. The low propensity of side chains with branching at the C(delta) in the Y position supports models suggesting these groups block solvent access to backbone C=O groups. These data provide a first step in defining sequence-dependent variations in local triple-helix stability and binding, and are important for a general understanding of side chain interactions in all proteins.
An algorithm was derived to relate the amino acid sequence of a collagen triple helix to its thermal stability. This calculation is based on the triple helical stabilization propensities of individual residues and their intermolecular and intramolecular interactions, as quantitated by melting temperature values of host-guest peptides. Experimental melting temperature values of a number of triple helical peptides of varying length and sequence were successfully predicted by this algorithm. However, predicted T m values are significantly higher than experimental values when there are strings of oppositely charged residues or concentrations of like charges near the terminus. Application of the algorithm to collagen sequences highlights regions of unusually high or low stability, and these regions often correlate with biologically significant features. The prediction of stability from sequence indicates an understanding of the major forces maintaining this protein motif. The use of highly favorable KGE and KGD sequences is seen to complement the stabilizing effects of imino acids in modulating stability and may become dominant in the collagenous domains of bacterial proteins that lack hydroxyproline. The effect of single amino acid mutations in the X and Y positions can be evaluated with this algorithm. An interactive collagen stability calculator based on this algorithm is available online.The ability to predict structure and stability from amino acid sequence is an important step in the understanding of basic protein principles and the structural consequences of pathological mutations. The vast number of amino acid sequences available from DNA data contrasts with the smaller number of high resolution protein structures and the limited experimental data on protein stability. The ability to make predictions that are in good agreement with experimental data provides insight into the stabilizing interactions within proteins. In addition, there is much interest in computing the effect of single amino acid replacements on protein stability because destabilizing effects are associated with deleterious mutations that result in clinically detectable phenotypes (1-3). In contrast to globular proteins, the relation among sequence, structure, and stability is simpler and better defined for the linear collagen triple helix.The collagen triple helix motif is found widely in structural proteins of the extracellular matrix and in an increasing set of non-collagenous proteins, many of which are involved in host-defense functions (4, 5). The close packing of three supercoiled polyproline II-like polypeptide chains in the collagen triple helix generates a requirement for Gly as every third residue (6 -8). The observation of such a repeating (Gly-X-Y) n sequence pattern over a stretch of residues signifies a triple helix conformation. However, the collagen triple helix is not uniform in structure or stability. Crystal structures of collagen peptides show that variation in amino acid content leads to small but significant variations i...
Proteins with sequence-specific DNA binding function are important for a wide range of biological activities. De novo prediction of their DNA-binding specificities from sequence alone would be a great aid in inferring cellular networks. Here we introduce a method for predicting DNA-binding specificities for Cys2His2 zinc fingers (C2H2-ZFs), the largest family of DNA-binding proteins in metazoans. We develop a general approach, based on empirical calculations of pairwise amino acid–nucleotide interaction energies, for predicting position weight matrices (PWMs) representing DNA-binding specificities for C2H2-ZF proteins. We predict DNA-binding specificities on a per-finger basis and merge predictions for C2H2-ZF domains that are arrayed within sequences. We test our approach on a diverse set of natural C2H2-ZF proteins with known binding specificities and demonstrate that for >85% of the proteins, their predicted PWMs are accurate in 50% of their nucleotide positions. For proteins with several zinc finger isoforms, we show via case studies that this level of accuracy enables us to match isoforms with their known DNA-binding specificities. A web server for predicting a PWM given a protein containing C2H2-ZF domains is available online at http://zf.princeton.edu and can be used to aid in protein engineering applications and in genome-wide searches for transcription factor targets.
Important stabilizing features for the collagen triple helix include the presence of Gly as every third residue, a high content of imino acids, and interchain hydrogen bonds. Host-guest peptides have been used previously to characterize triple-helix propensities of individual residues and Gly-X-Y triplets. Here, comparison of the thermal stabilities of host-guest peptides of the form (Gly-Pro-Hyp)3-Gly-X-Y-Gly-X'-Y'-(Gly-Pro-Hyp)3 extends the study to adjacent tripeptide sequences, to encompass the major classes of potential direct intramolecular interactions. Favorable hydrophobic interactions were observed, as well as stabilizing intrachain interactions between residues of opposite charge in the i and i + 3 positions. However, the greatest gain in triple-helix stability was achieved in the presence of Gly-Pro-Lys-Gly-Asp/Glu-Hyp sequences, leading to a T(m) value equal to that seen for a Gly-Pro-Hyp-Gly-Pro-Hyp sequence. This stabilization is seen for Lys but not for Arg and can be assigned to interchain ion pairs, as shown by molecular modeling. Computational analysis shows that Lys-Gly-Asp/Glu sequences are present at a frequency much greater than expected in collagen, suggesting this interaction is biologically important. These results add significantly to the understanding of which surface ion pairs can contribute to protein stability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.