Mega-scale experimental analysis of protein folding stability in biology and design

Tsuboyama, Kaoru; Dauparas, Justas; Chen, Jonathan; Laine, Élodie; Behbahani, Yasser Mohseni; Weinstein, Jonathan; Mangan, Niall M.; Овчинников, С. Г.; Rocklin, Gabriel J.

doi:10.1038/s41586-023-06328-6

Cited by 141 publications

(126 citation statements)

References 86 publications

Supporting

Mentioning

121

Contrasting

Order By: Relevance

“…Due to these limitations, as well as the enormous resource cost of most current DMS methodologies, they are unlikely to replace computational prediction tools as the main avenue to fully understanding functional effects of missense mutations in the near future. However, an exciting new methodology, dubbed cDNA display proteolysis, was recently shown to be capable of assessing functional variant effects on protein thermodynamic stability at tremendous scale and speed (Tsuboyama et al, 2022). While limited to a stability phenotype, such a DMS approach also presents a valuable opportunity to gleam insight into the mechanisms of LOF disease, further test the accuracy of current computational tools on a large independent dataset and use it for training and developing better methodologies.…”

Section: Discussionmentioning

confidence: 99%

Correspondence between functional scores from deep mutational scans and predicted effects on protein stability

2023

View full text Add to dashboard Cite

Many methodologically diverse computational methods have been applied to the growing challenge of predicting and interpreting the effects of protein variants.As many pathogenic mutations have a perturbing effect on protein stability or intermolecular interactions, one highly interpretable approach is to use protein structural information to model the physical impacts of variants and predict their likely effects on protein stability and interactions. Previous efforts have assessed the accuracy of stability predictors in reproducing thermodynamically accurate values and evaluated their ability to distinguish between known pathogenic and benign mutations. Here, we take an alternate approach, and explore how well stability predictor scores correlate with functional impacts derived from deep mutational scanning (DMS) experiments. In this work, we compare the predictions of 9 protein stability-based tools against mutant protein fitness values from 49 independent DMS datasets, covering 170,940 unique single amino acid variants. We find that FoldX and Rosetta show the strongest correlations with DMS-based functional scores, similar to their previous top performance in distinguishing between pathogenic and benign variants. For both methods, performance is considerably improved when considering intermolecular interactions from protein complex structures, when available. Furthermore, using these two predictors, we derive a "Foldetta" consensus score, which improves upon the performance of both, and manages to match dedicated variant effect predictors in reflecting variant functional impacts. Finally, we also highlight that predicted stability effects show consistently higher correlations with certain DMS experimental phenotypes, particularly those based upon protein abundance, and, in certain cases, can significantly outcompete sequence-based variant effect prediction methodologies for predicting functional scores from DMS experiments.

show abstract

Section: Discussionmentioning

confidence: 99%

Correspondence between functional scores from deep mutational scans and predicted effects on protein stability

2023

View full text Add to dashboard Cite

show abstract

“…Since then, we observed that the additional set of sequences had a limited impact on the performance (average Δ ρ̄ = 0.012 on the dataset reported (Hopf et al, 2017)). Hence, in more recent studies (Tsuboyama et al, 2023; Mohseni Behbahani et al, 2023), we solely relied on an input alignment generated with the ProteinGym-MSA protocol. In the present work, for all calculations, we asked GEMME to exploit only a single input MSA generated by one of the four tested protocols and resources (see Additional file 1: Supplementary Methods for computational details).…”

Section: Methodsmentioning

confidence: 99%

“…GEMME is freely available for the community through a stand-alone package and a web server. It proved useful for discovering functionally important sites in proteins (Tsuboyama et al, 2023; Cagiada et al, 2023), classifying variants of the human glucokinase (Gersing et al, 2023) and transmembrane proteins (Tiemann et al, 2023), among others, and for deciphering the molecular mechanisms underlying diseases such as the Lynch syndrome (Abildgaard et al, 2023).…”

Section: Introductionmentioning

confidence: 99%

Alignment-based protein mutational landscape prediction: doing more with less

Abakarova

Marquet

Rera

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent efforts for democratising protein structure prediction have leveraged the MMseqs2 algorithm to efficiently generate multiple sequence alignments with high diversity and a limited number of sequences. Here, we investigated the usefulness of this strategy for mutational outcome prediction. We place ourselves in a context where we only exploit information coming from the input alignment for making predictions. Through a large-scale assessment of ≈1.5M missense variants across 72 protein families, we show that the MMseqs2-based protocol implemented in ColabFold compares favourably with tools and resources relying on profile-Hidden Markov Models. Our study demonstrates the feasibility of simultaneously providing high-quality and compute-efficient alignment-based predictions for the mutational landscape of entire proteomes.

show abstract

“…Interestingly, these numbers have barely changed over the past decade, indicating that a qualitative paradigm shift might be needed to advance ML-based prediction of mutation-induced protein stability changes . It is possible that large new data sets could provide the necessary boost, and some exciting studies collecting such data are already appearing: cDNA display proteolysis was recently used to measure the thermodynamic stability of around 850 000 single-point and selected double-point mutants of 354 natural and 188 de novo designed protein domains between 40 and 72 amino acids in length …”

Section: Protein Engineering Tasks Solved By Machine Learningmentioning

confidence: 99%

“…132 It is possible that large new data sets could provide the necessary boost, and some exciting studies collecting such data are already appearing: cDNA display proteolysis was recently used to measure the thermodynamic stability of around 850 000 single-point and selected doublepoint mutants of 354 natural and 188 de novo designed protein domains between 40 and 72 amino acids in length. 140 Changes in catalytic activity upon mutation also attract the attention of ML researchers. Predicting mutational effects on enzyme activity is more challenging than predicting protein stability and solubility due to the enormous diversity of enzymatic mechanisms.…”

Section: Supervised Learning To Predict the Effects Of Mutationsmentioning

confidence: 99%

Machine Learning-Guided Protein Engineering

Kouba,

Kohout,

Haddadi

et al. 2023

ACS Catal.

View full text Add to dashboard Cite

Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.

show abstract

Mega-scale experimental analysis of protein folding stability in biology and design

Cited by 141 publications

References 86 publications

Correspondence between functional scores from deep mutational scans and predicted effects on protein stability

Correspondence between functional scores from deep mutational scans and predicted effects on protein stability

Alignment-based protein mutational landscape prediction: doing more with less

Machine Learning-Guided Protein Engineering

Contact Info

Product

Resources

About