2023
DOI: 10.1101/2022.12.31.522396
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability

Abstract: Prediction of protein stability change due to single mutation is important for biotechnology, medicine, and our understanding of physics underlying protein folding. Despite the recent tremendous success in 3D protein structure prediction, the apparently simpler problem of predicting the effect of mutations on protein stability has been hampered by the low amount of experimental data. With the recent high-throughput measurements of mutational effects in 'mega' experiment for ~850,000 mutations [Tsuboyama et al.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 28 publications
0
6
0
Order By: Relevance
“…First, the dynamic range of the proteolysis assay is limited to ~5 kcal/mol 19 , while experimental stability datasets such as our Fireprot dataset may include mutations with up to ±10 kcal/mol DDG°. This means models trained on Megascale have limited capability to predict large changes in stability, a property that we also observe in other recently published models utilizing the Megascale dataset 16,26 . Second, we found that surface mutations to cysteine were often observed to be highly stabilizing in the Megascale dataset, such that ThermoMPNN would heavily favor surface cysteine mutations unless omitted from the permitted residue options (Supplementary Fig.…”
Section: Discussionmentioning
confidence: 56%
See 1 more Smart Citation
“…First, the dynamic range of the proteolysis assay is limited to ~5 kcal/mol 19 , while experimental stability datasets such as our Fireprot dataset may include mutations with up to ±10 kcal/mol DDG°. This means models trained on Megascale have limited capability to predict large changes in stability, a property that we also observe in other recently published models utilizing the Megascale dataset 16,26 . Second, we found that surface mutations to cysteine were often observed to be highly stabilizing in the Megascale dataset, such that ThermoMPNN would heavily favor surface cysteine mutations unless omitted from the permitted residue options (Supplementary Fig.…”
Section: Discussionmentioning
confidence: 56%
“…The copyright holder for this preprint (which this version posted July 30, 2023. ; https://doi.org/10.1101/2023.07. 27.550881 doi: bioRxiv preprint Recent achievements using large language models (LLMs) for protein structure prediction have inspired models using pre-learned sequence embeddings to train models for various protein design tasks via transfer learning 15 , including for sequence-based stability prediction 16,17 . At the same time, Dauparas et al released ProteinMPNN, a message-passing neural network (MPNN) trained on 19,700 protein clusters comprising the entire Protein Data Bank (PDB) (after quality filtering) to recover native-like sequences from a given protein backbone 18 .…”
Section: Introductionmentioning
confidence: 99%
“…Our comprehensive assessments demonstrate that EpHod outperforms numerous computational methods in pHopt prediction, further reinforcing the growing consensus that semi-supervised strategies employing protein language model embeddings facilitate state-of-the-art performance across various tasks. 44,67,84,85 Nevertheless, our analyses reveal that the performance of language models for pHopt prediction varies considerably despite extensive hyperparameter optimization (Figure 2A, 3B-C). Moreover, the relative performance of these language models for our pHopt task differs from other tasks.…”
Section: Discussionmentioning
confidence: 91%
“…This allows ThermoMPNN to reweight the input vector using contextual information via self-attention. Light attention has recently been shown to improve sequence-based protein localization ( 15 ) and ΔΔG° prediction ( 16 ) from LLM sequence embeddings, but this work utilizes light attention for refinement of structural embeddings. The adjusted embedding is then passed through a small multilayer perceptron (MLP) with two hidden layers ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Recent achievements using large language models (LLMs) for protein structure prediction have inspired models using prelearned sequence embeddings to train models for various protein design tasks via transfer learning ( 15 ), including for sequence-based stability prediction ( 16 , 17 ). At the same time, Dauparas et al released ProteinMPNN, a message-passing neural network (MPNN) trained on 19,700 protein clusters comprising the entire Protein Data Bank (PDB) (after quality filtering) to recover native-like sequences from a given protein backbone ( 18 ).…”
mentioning
confidence: 99%