Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations

Iqbal, Shahid; Li, Fuyi; Akutsu, Tatsuya; Ascher, David B.; Webb, Geoffrey I.; Song, Jiangning

doi:10.1093/bib/bbab184

Cited by 42 publications

(35 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another very relevant point is to which extent the methods are affected by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$\Delta \Delta G$\end{document} measures obtained outside physiological conditions. A recent paper [ 5 ] showed that there are some predictors in some extreme ranges of pH and temperature that decreases the performance. S669 dataset was divided into two parts: the former group containing variants whose temperature and pH are in physiological ranges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$[293.15,313.15]$\end{document} K (20–40 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$^\circ C$\end{document} ) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$[6.0,8.0]$\end{document} , respectively.…”

Section: Resultsmentioning

confidence: 99%

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset

Pancotti

Benevenuta

Birolo

et al. 2022

Briefings in Bioinformatics

102

View full text Add to dashboard Cite

Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and ‘all’ available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21–0.5 and 0–0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51–0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the $\Delta \Delta G$ predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.

show abstract

Section: Resultsmentioning

confidence: 99%

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset

Pancotti

Benevenuta

Birolo

et al. 2022

Briefings in Bioinformatics

102

View full text Add to dashboard Cite

show abstract

“…Several challenges are of particular interest: (i) First, the slope of the regression line for predicted versus experimental ΔΔG values is much smaller than 1 for most methods and the range of the predicted ΔΔG values is narrower than the range of experimental values. 47,76,104 (ii) Almost all methods were trained on data with surplus of destabilizing mutations. 68,79 (iii) Hysteresis causes reverse mutations to not generally have simply inversed ΔΔG values.…”

Section: Data Set Biases: Destabilization and Mutation Typementioning

confidence: 99%

Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models

Bæk

Kepp

2022

J Comput Chem

View full text Add to dashboard Cite

Accurate prediction of protein stability changes upon mutation (ΔΔG) is increasingly important to evolution studies, protein engineering, and screening of disease-causing gene variants but is challenged by biases in training data. We investigated 45 linear regression models trained on data sets that account systematically for destabilization bias and mutation-type bias B M . The models were externally validated on three test data sets probing different pathologies and for internal consistency (symmetry and neutrality). Model structure and performance substantially depended on training data and even fitting method. We developed two final models: SimBa-IB for typical natural mutations and SimBa-SYM for situations where stabilizing and destabilizing mutations occur to a similar extent. SimBa-SYM, despite is simplicity, is essentially non-biased (vs. the S sym data set) while still performing well for all data sets (R $ 0.46-0.54, MAE = 1.16-1.24 kcal/mol). The simple models provide advantage in terms of interpretability, use and future improvement, and are freely available on GitHub.

show abstract

“…Generally, the predicting approaches can be divided into two categories, namely the statistics-based methods using machine learning (ML) and the structure-based methods using physical models (such as those applying force fields). Although the ML-based methods usually exhibit higher computational efficiency and accuracy compared with the physics-based approaches [ 14 , 15 ], these methods may usually suffer from the problems of difficulty for mechanism explanation. And the ML-based methods tend to show a limited scope of application due to the biased or limited training set.…”

Section: Introductionmentioning

confidence: 99%

Predicting the mutation effects of protein–ligand interactions via end-point binding free energy calculations: strategies and analyses

Wang

et al. 2022

J Cheminform

View full text Add to dashboard Cite

Protein mutations occur frequently in biological systems, which may impact, for example, the binding of drugs to their targets through impairing the critical H-bonds, changing the hydrophobic interactions, etc. Thus, accurately predicting the effects of mutations on biological systems is of great interests to various fields. Unfortunately, it is still unavailable to conduct large-scale wet-lab mutation experiments because of the unaffordable experimental time and financial costs. Alternatively, in silico computation can serve as a pioneer to guide the experiments. In fact, numerous pioneering works have been conducted from computationally cheaper machine-learning (ML) methods to the more expensive alchemical methods with the purpose to accurately predict the mutation effects. However, these methods usually either cannot result in a physically understandable model (ML-based methods) or work with huge computational resources (alchemical methods). Thus, compromised methods with good physical characteristics and high computational efficiency are expected. Therefore, here, we conducted a comprehensive investigation on the mutation issues of biological systems with the famous end-point binding free energy calculation methods represented by MM/GBSA and MM/PBSA. Different computational strategies considering different length of MD simulations, different value of dielectric constants and whether to incorporate entropy effects to the predicted total binding affinities were investigated to provide a more accurate way for predicting the energetic change upon protein mutations. Overall, our result shows that a relatively long MD simulation (e.g. 100 ns) benefits the prediction accuracy for both MM/GBSA and MM/PBSA (with the best Pearson correlation coefficient between the predicted ∆∆G and the experimental data of ~ 0.44 for a challenging dataset). Further analyses shows that systems involving large perturbations (e.g. multiple mutations and large number of atoms change in the mutation site) are much easier to be accurately predicted since the algorithm works more sensitively to the large change of the systems. Besides, system-specific investigation reveals that conformational adjustment is needed to refine the micro-environment of the manually mutated systems and thus lead one to understand why longer MD simulation is necessary to improve the predicting result. The proposed strategy is expected to be applied in large-scale mutation effects investigation with interpretation. Graphical Abstract

show abstract

Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations

Cited by 42 publications

References 59 publications

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset

Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset

Data set and fitting dependencies when estimating protein mutant stability: Toward simple, balanced, and interpretable models

Predicting the mutation effects of protein–ligand interactions via end-point binding free energy calculations: strategies and analyses

Contact Info

Product

Resources

About