Open source code implementing Probalign as well as for producing the simulated data, and all real and simulated data are freely available from http://www.cs.njit.edu/usman/probalign
A distance constraint model (DCM) is presented that identifies flexible regions within protein structure consistent with specified thermodynamic condition. The DCM is based on a rigorous free energy decomposition scheme representing structure as fluctuating constraint topologies. Entropy non-additivity is problematic for naive decompositions, limiting the success of heat capacity predictions. The DCM resolves non-additivity by summing over independent entropic components determined by an efficient network-rigidity algorithm. A minimal 3-parameter DCM is demonstrated to accurately reproduce experimental heat capacity curves. Free energy landscapes and quantitative stability-flexibility relationships are obtained in terms of global flexibility. Several connections to experiment are made.
Many reports qualitatively describe conserved stability and flexibility profiles across protein families, but biophysical modeling schemes have not been available to robustly quantify both. Here we investigate an orthologous RNase H pair by using a minimal distance constraint model (DCM). The DCM is an all atom microscopic model [Jacobs and Dallakyan, Biophys J 2005;88(2):903-915] that accurately reproduces heat capacity measurements [Livesay et al., FEBS Lett 2004;576(3):468-476], and is unique in its ability to harmoniously calculate thermodynamic stability and flexibility in practical computing times. Consequently, quantified stability/flexibility relationships (QSFR) can be determined using the DCM. For the first time, a comparative QSFR analysis is performed, serving as a paradigm study to illustrate the utility of a QSFR analysis for elucidating evolutionarily conserved stability and flexibility profiles. Despite global conservation of QSFR profiles, distinct enthalpy-entropy compensation mechanisms are identified between the RNase H pair. In both cases, local flexibility metrics parallel H/D exchange experiments by correctly identifying the folding core and several flexible regions. Remarkably, at appropriately shifted temperatures (e.g., melting temperature), these differences lead to a global conservation in Landau free energy landscapes, which directly relate thermodynamic stability to global flexibility. Using ensemble-based sampling within free energy basins, rigidly, and flexibly correlated regions are quantified through cooperativity correlation plots. Five conserved flexible regions are identified within the structures of the orthologous pair. Evolutionary conservation of these flexibly correlated regions is strongly suggestive of their catalytic importance. Conclusions made herein are demonstrated to be robust with respect to the DCM parameterization.
In this report, we demonstrate that phylogenetic motifs, sequence regions conserving the overall familial phylogeny, represent a promising approach to protein functional site prediction. Across our structurally and functionally heterogeneous data set, phylogenetic motifs consistently correspond to functional sites defined by both surface loops and active site clefts. Additionally, the partially buried prosthetic group regions of cytochrome P450 and succinate dehydrogenase are identified as phylogenetic motifs. In nearly all instances, phylogenetic motifs are structurally clustered, despite little overall sequence proximity, around key functional site features. Based on calculated false-positive expectations and standard motif identification methods, we show that phylogenetic motifs are generally conserved in sequence. This result implies that they can be considered motifs in the traditional sense as well. However, there are instances where phylogenetic motifs are not (overall) well conserved in sequence. This point is enticing, because it implies that phylogenetic motifs are able to identify key sequence regions that traditional motif-based approaches would not. Further, phylogenetic motif results are also shown to be consistent with evolutionary trace results, and bootstrapping is used to demonstrate tree significance.
BackgroundWe examine the accuracy of enzyme catalytic residue predictions from a network representation of protein structure. In this model, amino acid α-carbons specify vertices within a graph and edges connect vertices that are proximal in structure. Closeness centrality, which has shown promise in previous investigations, is used to identify important positions within the network. Closeness centrality, a global measure of network centrality, is calculated as the reciprocal of the average distance between vertex i and all other vertices.ResultsWe benchmark the approach against 283 structurally unique proteins within the Catalytic Site Atlas. Our results, which are inline with previous investigations of smaller datasets, indicate closeness centrality predictions are statistically significant. However, unlike previous approaches, we specifically focus on residues with the very best scores. Over the top five closeness centrality scores, we observe an average true to false positive rate ratio of 6.8 to 1. As demonstrated previously, adding a solvent accessibility filter significantly improves predictive power; the average ratio is increased to 15.3 to 1. We also demonstrate (for the first time) that filtering the predictions by residue identity improves the results even more than accessibility filtering. Here, we simply eliminate residues with physiochemical properties unlikely to be compatible with catalytic requirements from consideration. Residue identity filtering improves the average true to false positive rate ratio to 26.3 to 1. Combining the two filters together has little affect on the results. Calculated p-values for the three prediction schemes range from 2.7E-9 to less than 8.8E-134. Finally, the sensitivity of the predictions to structure choice and slight perturbations is examined.ConclusionOur results resolutely confirm that closeness centrality is a viable prediction scheme whose predictions are statistically significant. Simple filtering schemes substantially improve the method's predicted power. Moreover, no clear effect on performance is observed when comparing ligated and unligated structures. Similarly, the CC prediction results are robust to slight structural perturbations from molecular dynamics simulation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.