BackgroundProtein sites evolve at different rates due to functional and biophysical constraints. It is usually considered that the main structural determinant of a site’s rate of evolution is its Relative Solvent Accessibility (RSA). However, a recent comparative study has shown that the main structural determinant is the site’s Local Packing Density (LPD). LPD is related with dynamical flexibility, which has also been shown to correlate with sequence variability. Our purpose is to investigate the mechanism that connects a site’s LPD with its rate of evolution.ResultsWe consider two models: an empirical Flexibility Model and a mechanistic Stress Model. The Flexibility Model postulates a linear increase of site-specific rate of evolution with dynamical flexibility. The Stress Model, introduced here, models mutations as random perturbations of the protein’s potential energy landscape, for which we use simple Elastic Network Models (ENMs). To account for natural selection we assume a single active conformation and use basic statistical physics to derive a linear relationship between site-specific evolutionary rates and the local stress of the mutant’s active conformation.We compare both models on a large and diverse dataset of enzymes. In a protein-by-protein study we found that the Stress Model outperforms the Flexibility Model for most proteins. Pooling all proteins together we show that the Stress Model is strongly supported by the total weight of evidence. Moreover, it accounts for the observed nonlinear dependence of sequence variability on flexibility. Finally, when mutational stress is controlled for, there is very little remaining correlation between sequence variability and dynamical flexibility.ConclusionsWe developed a mechanistic Stress Model of evolution according to which the rate of evolution of a site is predicted to depend linearly on the local mutational stress of the active conformation. Such local stress is proportional to LPD, so that this model explains the relationship between LPD and evolutionary rate. Moreover, the model also accounts for the nonlinear dependence between evolutionary rate and dynamical flexibility.
Functional and biophysical constraints result in site-dependent patterns of protein sequence variability. It is commonly assumed that the key structural determinant of site-specific rates of evolution is the Relative Solvent Accessibility (RSA). However, a recent study found that amino acid substitution rates correlate better with two Local Packing Density (LPD) measures, the Weighted Contact Number (WCN) and the Contact Number (CN), than with RSA. This work aims at a more thorough assessment. To this end, in addition to substitution rates, we considered four other sequence variability scores, four measures of solvent accessibility (SA), and other CN measures. We compared all properties for each protein of a structurally and functionally diverse representative dataset of monomeric enzymes. We show that the best sequence variability measures take into account phylogenetic tree topology. More importantly, we show that both LPD measures (WCN and CN) correlate better than all of the SA measures, regardless of the sequence variability score used. Moreover, the independent contribution of the best LPD measure is approximately four times larger than that of the best SA measure. This study strongly supports the conclusion that a site's packing density rather than its solvent accessibility is the main structural determinant of its rate of evolution.
Protein complexes are involved in many biological processes. Examining coupling between subunits of a complex would be useful to understand the molecular basis of protein function. Here, our updated (PS)2 web server predicts the three-dimensional structures of protein complexes based on comparative modeling; furthermore, this server examines the coupling between subunits of the predicted complex by combining structural and evolutionary considerations. The predicted complex structure could be indicated and visualized by Java-based 3D graphics viewers and the structural and evolutionary profiles are shown and compared chain-by-chain. For each subunit, considerations with or without the packing contribution of other subunits cause the differences in similarities between structural and evolutionary profiles, and these differences imply which form, complex or monomeric, is preferred in the biological condition for the subunit. We believe that the (PS)2 server would be a useful tool for biologists who are interested not only in the structures of protein complexes but also in the coupling between subunits of the complexes. The (PS)2 is freely available at http://ps2v3.life.nctu.edu.tw/.
Due to advances in structural biology, an increasing number of protein structures of unknown function have been deposited in Protein Data Bank (PDB). These proteins are usually characterized by novel structures and sequences. Conventional comparative methodology (such as sequence alignment, structure comparison, or template search) is unable to determine their function. Thus, it is important to identify protein's function directly from its structure, but this is not an easy task. One of the strategies used is to analyze whether there are distinctive structure-derived features associated with functional residues. If so, one may be able to identify the functional residues directly from a single structure. Recently, we have shown that protein weighted contact number is related to atomic thermal fluctuations and can be used to derive motional correlations in proteins. In this report, we analyze the weighted contact-number profiles of both catalytic residues and non-catalytic residues for a dataset of 760 structures. We found that catalytic residues have distinct distributions of weighted contact numbers from those of non-catalytic residues. Using this feature, we are able to effectively differentiate catalytic residues from other residues with a single optimized threshold value. Our method is simple to implement and compares favourably with other more sophisticated methods. In addition, we discuss the physics behind the relationship between catalytic residues and their contact numbers as well as other features (such as residue centrality or B-factors) associated with catalytic residues.
Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C atoms with other substructures in their contributions to the sequence conservation. Our results show that C positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C and all-atom substructures. These results indicate that only C atoms of a protein structure could reflect sequence conservation at the residue level.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.