Background Predicting the effect of single point variations on protein stability constitutes a crucial step toward understanding the relationship between protein structure and function. To this end, several methods have been developed to predict changes in the Gibbs free energy of unfolding (∆∆G) between wild type and variant proteins, using sequence and structure information. Most of the available methods however do not exhibit the anti-symmetric prediction property, which guarantees that the predicted ∆∆G value for a variation is the exact opposite of that predicted for the reverse variation, i.e., ∆∆G(A → B) = −∆∆G(B → A), where A and B are amino acids. Results Here we introduce simple anti-symmetric features, based on evolutionary information, which are combined to define an untrained method, DDGun (DDG untrained). DDGun is a simple approach based on evolutionary information that predicts the ∆∆G for single and multiple variations from sequence and structure information (DDGun3D). Our method achieves remarkable performance without any training on the experimental datasets, reaching Pearson correlation coefficients between predicted and measured ∆∆G values of ~ 0.5 and ~ 0.4 for single and multiple site variations, respectively. Surprisingly, DDGun performances are comparable with those of state of the art methods. DDGun also naturally predicts multiple site variations, thereby defining a benchmark method for both single site and multiple site predictors. DDGun is anti-symmetric by construction predicting the value of the ∆∆G of a reciprocal variation as almost equal (depending on the sequence profile) to -∆∆G of the direct variation. This is a valuable property that is missing in the majority of the methods. Conclusions Evolutionary information alone combined in an untrained method can achieve remarkably high performances in the prediction of ∆∆G upon protein mutation. Non-trained approaches like DDGun represent a valid benchmark both for scoring the predictive power of the individual features and for assessing the learning capability of supervised methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-2923-1) contains supplementary material, which is available to authorized users.
Vertebrate visual phototransduction is perhaps the most well-studied G-protein signaling pathway. A wealth of available biochemical and electrophysiological data has resulted in a rich history of mathematical modeling of the system. However, while the most comprehensive models have relied upon amphibian biochemical and electrophysiological data, modern research typically employs mammalian species, particularly mice, which exhibit significantly faster signaling dynamics. In this work, we present an adaptation of a previously published, comprehensive model of amphibian phototransduction that can produce quantitatively accurate simulations of the murine photoresponse. We demonstrate the ability of the model to predict responses to a wide range of stimuli and under a variety of mutant conditions. Finally, we employ the model to highlight a likely unknown mechanism related to the interaction between rhodopsin and rhodopsin kinase.
Accurate prediction of protein stability changes upon single-site variations (G) is important for protein design, as well as our understanding of the mechanism of genetic diseases. The performance of high-throughput computational methods to this end is evaluated mostly based on the Pearson correlation coefficient between predicted and observed data, assuming that the upper bound would be 1 (perfect correlation). However, the performance of these predictors can be limited by the distribution and noise of the experimental data. Here we estimate, for the first time, a theoretical upper-bound to the G prediction performances imposed by the intrinsic structure of currently available G data. Given a set of measured G protein variations, the theoretically "best predictor" is estimated based on its similarity to another set of experimentally determined G values. We investigate the correlation between pairs of measured G variations, where one is used as a predictor for the other. We analytically derive an upper bound to the Pearson correlation as a function of the noise and distribution of the G data. We also evaluate the available datasets to highlight the effect of the noise in conjunction with G distribution. We conclude that the upper bound is a function of both uncertainty and spread of the G values, and that with current data the best performance should be between 0.7-0.8, depending on the dataset used; higher Pearson correlations might be indicative of overtraining. It also follows that comparisons of predictors using different datasets are inherently misleading.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.