2023
DOI: 10.1038/s41586-023-06328-6
|View full text |Cite
|
Sign up to set email alerts
|

Mega-scale experimental analysis of protein folding stability in biology and design

Abstract: Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale1. However, the energetics driving folding are invisible in these structures and remain largely unknown2. The hidden thermodynamics of folding can drive disease3,4, shape protein evolution5–7 and guide protein engineering8–10, and new approaches are needed to reveal these thermodynamics for every sequence and structure. Here we present cDNA display proteolysis, a method for measuring… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
121
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 141 publications
(126 citation statements)
references
References 86 publications
5
121
0
Order By: Relevance
“…Due to these limitations, as well as the enormous resource cost of most current DMS methodologies, they are unlikely to replace computational prediction tools as the main avenue to fully understanding functional effects of missense mutations in the near future. However, an exciting new methodology, dubbed cDNA display proteolysis, was recently shown to be capable of assessing functional variant effects on protein thermodynamic stability at tremendous scale and speed (Tsuboyama et al, 2022). While limited to a stability phenotype, such a DMS approach also presents a valuable opportunity to gleam insight into the mechanisms of LOF disease, further test the accuracy of current computational tools on a large independent dataset and use it for training and developing better methodologies.…”
Section: Discussionmentioning
confidence: 99%
“…Due to these limitations, as well as the enormous resource cost of most current DMS methodologies, they are unlikely to replace computational prediction tools as the main avenue to fully understanding functional effects of missense mutations in the near future. However, an exciting new methodology, dubbed cDNA display proteolysis, was recently shown to be capable of assessing functional variant effects on protein thermodynamic stability at tremendous scale and speed (Tsuboyama et al, 2022). While limited to a stability phenotype, such a DMS approach also presents a valuable opportunity to gleam insight into the mechanisms of LOF disease, further test the accuracy of current computational tools on a large independent dataset and use it for training and developing better methodologies.…”
Section: Discussionmentioning
confidence: 99%
“…Since then, we observed that the additional set of sequences had a limited impact on the performance (average Δ ρ̄ = 0.012 on the dataset reported (Hopf et al, 2017)). Hence, in more recent studies (Tsuboyama et al, 2023; Mohseni Behbahani et al, 2023), we solely relied on an input alignment generated with the ProteinGym-MSA protocol. In the present work, for all calculations, we asked GEMME to exploit only a single input MSA generated by one of the four tested protocols and resources (see Additional file 1: Supplementary Methods for computational details).…”
Section: Methodsmentioning
confidence: 99%
“…GEMME is freely available for the community through a stand-alone package and a web server. It proved useful for discovering functionally important sites in proteins (Tsuboyama et al, 2023; Cagiada et al, 2023), classifying variants of the human glucokinase (Gersing et al, 2023) and transmembrane proteins (Tiemann et al, 2023), among others, and for deciphering the molecular mechanisms underlying diseases such as the Lynch syndrome (Abildgaard et al, 2023).…”
Section: Introductionmentioning
confidence: 99%
“…Interestingly, these numbers have barely changed over the past decade, indicating that a qualitative paradigm shift might be needed to advance ML-based prediction of mutation-induced protein stability changes . It is possible that large new data sets could provide the necessary boost, and some exciting studies collecting such data are already appearing: cDNA display proteolysis was recently used to measure the thermodynamic stability of around 850 000 single-point and selected double-point mutants of 354 natural and 188 de novo designed protein domains between 40 and 72 amino acids in length …”
Section: Protein Engineering Tasks Solved By Machine Learningmentioning
confidence: 99%
“…132 It is possible that large new data sets could provide the necessary boost, and some exciting studies collecting such data are already appearing: cDNA display proteolysis was recently used to measure the thermodynamic stability of around 850 000 single-point and selected doublepoint mutants of 354 natural and 188 de novo designed protein domains between 40 and 72 amino acids in length. 140 Changes in catalytic activity upon mutation also attract the attention of ML researchers. Predicting mutational effects on enzyme activity is more challenging than predicting protein stability and solubility due to the enormous diversity of enzymatic mechanisms.…”
Section: Supervised Learning To Predict the Effects Of Mutationsmentioning
confidence: 99%