2022
DOI: 10.1101/2022.07.20.500902
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evolutionary-scale prediction of atomic level protein structure with a language model

Abstract: Large language models have recently been shown to develop emergent capabilities with scale, going beyond simple pattern matching to perform higher level reasoning and generate lifelike images and text. While language models trained on protein sequences have been studied at a smaller scale, little is known about what they learn about biology as they are scaled up. In this work we train models up to 15 billion parameters, the largest language models of proteins to be evaluated to date. We find that as models are… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

9
1,270
4
6

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 683 publications
(1,289 citation statements)
references
References 92 publications
(186 reference statements)
9
1,270
4
6
Order By: Relevance
“…With regard to the entire protein universe, however, further challenges still remain for GIBAC, when it comes to K d calculations involving intrinsically disordered proteins, bacause the flexibility involved makes structural modeling and structure-based K d prediction a rather difficult (if possible) problem, to which an experimental approach is perhaps the only feasible solution in practice [54,[96][97][98][99].…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…With regard to the entire protein universe, however, further challenges still remain for GIBAC, when it comes to K d calculations involving intrinsically disordered proteins, bacause the flexibility involved makes structural modeling and structure-based K d prediction a rather difficult (if possible) problem, to which an experimental approach is perhaps the only feasible solution in practice [54,[96][97][98][99].…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…Further parameters are an early stop criterion of a prediction certainty (pLDDT) above 85 or below 40, a default recycle count of 3 and a compilation of only the best performing out of five AlphaFold2 models. As the goal of LambdaPP is to provide a single reference for pLM-based predictions, 3D structure will soon be predicted using tools just recently presented in the literature (35; 72; 74), which will allow structure prediction to happen in seconds rather than in minutes, at accuracy comparable with MSA-based methods.…”
Section: Methodsmentioning
confidence: 99%
“…Embeddings from pLMs have been successfully used as input to downstream protein prediction tools (78; 37-39; 41; 42; 44; 64; 14; 25; 27; 63; 72). While some pLM-based methods do not reach the performance of MSA-based methods (23; 38; 72), others exceed those (5; 10; 23; 39; 42; 64; 28; 29; 36). Prediction performance has risen so much that sequence-specific predictions based on pLMs can capture some aspects of structural and functional dynamics better than much more accurate family-averaged solutions even from AlphaFold2 (35; 72; 74).…”
Section: Introductionmentioning
confidence: 98%
See 1 more Smart Citation
“…It is thus becoming clear that pLM embeddings are rich inputs to downstream prediction methods of protein function and structure competing with those that exploit MSAs. However, what is encoded in embeddings, including how much evolutionary information as defined in MSAs (i.e., residue co-evolution) is implicitly captured, remains a subject of debate, even in light of correlation between MSAs and pLM embeddings in accuracy for 3D structure prediction [40].…”
Section: Moving From Physicochemical Functions To Deep Neural Network...mentioning
confidence: 99%