2022
DOI: 10.48550/arxiv.2205.13760
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

Abstract: The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful approaches so far to address these tasks. The performance of these methods is however contingent on the availability … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(18 citation statements)
references
References 95 publications
0
18
0
Order By: Relevance
“…The next best-performing model was ESM-MSA with 64.18%, with ESM-1v and single sequence AlphaFold2 scoring the lowest in this metric (48.06% and 44.37% respectively). Overall, the accuracy achieved by NeuroFold represents a significant improvement over the typical success rates of ∼14.17% achieved using traditional methods (Notin et al, 2022; Table S2). The precision of NeuroFold on each dataset was also independently computed (Figure 4).…”
Section: Neurofold Architecturementioning
confidence: 87%
See 3 more Smart Citations
“…The next best-performing model was ESM-MSA with 64.18%, with ESM-1v and single sequence AlphaFold2 scoring the lowest in this metric (48.06% and 44.37% respectively). Overall, the accuracy achieved by NeuroFold represents a significant improvement over the typical success rates of ∼14.17% achieved using traditional methods (Notin et al, 2022; Table S2). The precision of NeuroFold on each dataset was also independently computed (Figure 4).…”
Section: Neurofold Architecturementioning
confidence: 87%
“…NeuroFold was tested on experimentally validated proteins from the ProteinGym dataset (Notin et al, 2022), TAPE stability dataset (Rao, 2019; Rocklin et al, 2017), as well as additional datasets collected from various studies (Johnson et al, 2023; Madani et al, 2023; Repecka et al, 2021; Russ et al, 2020). The subset of proteins used from the ProteinGym dataset was carefully curated so as to exclude experimental data from proteins with irrelevant non-enzymatic functions in processes such as viral replication and protein-protein interactions.…”
Section: Neurofold Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…Recent studies have used experimental data to evaluate the performance of PLMs in predicting the functional effects of variants [17,20,21,24,25]. However, to date, only one study ('ESM variant') has used a PLM to predict the clinical significance of a mutation 22 .…”
Section: Introductionmentioning
confidence: 99%