2021
DOI: 10.21203/rs.3.rs-584804/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Embeddings from protein language models predict conservation and variant effects

Abstract: The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (LMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or marked amino acid… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

2
1
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

5
0

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 52 publications
2
1
0
Order By: Relevance
“…We also experimented with deeper/more sophisticated networks without any gain from more free parameters (data not shown). This confirmed previous findings that simple networks suffice when inputting advanced embeddings ( 37 , 38 , 52 , 71 ). As the network was trained using contrastive learning, no final classification layer was needed.…”
Section: Methodssupporting
confidence: 91%
“…We also experimented with deeper/more sophisticated networks without any gain from more free parameters (data not shown). This confirmed previous findings that simple networks suffice when inputting advanced embeddings ( 37 , 38 , 52 , 71 ). As the network was trained using contrastive learning, no final classification layer was needed.…”
Section: Methodssupporting
confidence: 91%
“…We also experimented with deeper/more sophisticated networks without any gain from more free parameters (data not shown). This confirmed previous findings that simple networks suffice when inputting advanced embeddings (9,12,62,75). As the network was trained using contrastive learning, no final classification layer was needed.…”
Section: Methodssupporting
confidence: 91%
“…We chose ProtT5 over other embedding models, such as ESM-1b (36), based on our experience with the model and comparisons during previous projects (34,38). Furthermore, ProtT5 does not require splitting long sequences, which might remove valuable global context information, while ESM-1b can only handle sequences of up to 1022 residues.…”
Section: Methodsmentioning
confidence: 99%