2022
DOI: 10.48550/arxiv.2204.01168
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Few Shot Protein Generation

Abstract: We present the MSA-to-protein transformer, a generative model of protein sequences conditioned on protein families represented by multiple sequence alignments (MSAs). Unlike existing approaches to learning generative models of protein families, the MSA-to-protein transformer conditions sequence generation directly on a learned encoding of the multiple sequence alignment, circumventing the need for fitting dedicated family models. By training on a large set of well curated multiple sequence alignments in Pfam, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…However, unconditional PLMs suffer from a lack of context, often requiring fine-tuning to specialise their distributions towards a particular protein family of interest [25]. As a result, the leading autoregressive models exploit the information in MSAs to improve predictions, either by biasing language model likelihoods with statistics from the MSA, in the case of Tranception [28], or by explicitly conditioning on the MSA [12,33,42].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, unconditional PLMs suffer from a lack of context, often requiring fine-tuning to specialise their distributions towards a particular protein family of interest [25]. As a result, the leading autoregressive models exploit the information in MSAs to improve predictions, either by biasing language model likelihoods with statistics from the MSA, in the case of Tranception [28], or by explicitly conditioning on the MSA [12,33,42].…”
Section: Related Workmentioning
confidence: 99%
“…Family-based protein language models represent the conditional distribution over family members given a subset of other family members [35,12,33,42]. These models have proved especially effective as zero-shot fitness predictors, due to their ability to explicitly condition on evolutionary context to predict the effects of mutations.…”
Section: Scoring Functions For Family-based Protein Language Modelsmentioning
confidence: 99%