2023
DOI: 10.1038/s41587-022-01618-2
|View full text |Cite
|
Sign up to set email alerts
|

Large language models generate functional protein sequences across diverse families

Abstract: In this study, we address the challenge of consistently following emotional support strategies in long conversations by large language models (LLMs). We introduce the Strategy-Relevant Attention (SRA) metric, a model-agnostic measure designed to evaluate the effectiveness of LLMs in adhering to strategic prompts in emotional support contexts. By analyzing conversations within the Emotional Support Conversations dataset (ESConv) using LLaMA models, we demonstrate that SRA is significantly correlated with a mode… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
360
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 473 publications
(363 citation statements)
references
References 79 publications
2
360
0
1
Order By: Relevance
“…Typically, these generative protein sequence models are trained either on large collections of protein sequences, e.g. the entire UniProt database of millions of sequences [12][13][14] , or on a set of proteins from a particular family 15,16 , with an ultimate goal to learn the distribution of sequences within the training set in order to sample novel sequences with generalized properties of the distribution. The underlying assumption of generative protein models is that natural proteins are under evolutionary pressure to be functional, so novel sequences drawn from the same distribution will also be functional 17 .…”
Section: Mainmentioning
confidence: 99%
See 1 more Smart Citation
“…Typically, these generative protein sequence models are trained either on large collections of protein sequences, e.g. the entire UniProt database of millions of sequences [12][13][14] , or on a set of proteins from a particular family 15,16 , with an ultimate goal to learn the distribution of sequences within the training set in order to sample novel sequences with generalized properties of the distribution. The underlying assumption of generative protein models is that natural proteins are under evolutionary pressure to be functional, so novel sequences drawn from the same distribution will also be functional 17 .…”
Section: Mainmentioning
confidence: 99%
“…The underlying assumption of generative protein models is that natural proteins are under evolutionary pressure to be functional, so novel sequences drawn from the same distribution will also be functional 17 . Multiple different generative protein models have been proposed, including methods based on deep neural networks such as generative adversarial networks (GANs) 15 , variational auto-encoders (VAEs) 16,18 , and language models 13,14,[19][20][21][22] , and other neural networks 23,24 , as well as statistical methods such as ancestral sequence reconstruction (ASR) 25,26 and direct coupling analysis (DCA) [27][28][29] . However, comparing the performance of these methods, i.e.…”
Section: Mainmentioning
confidence: 99%
“…In the last two years, the introduction of Machine Learning (ML) based generative models, inspired by revolutionary deep generative models like ChatGPT, Stable Diffusion or DallE2 ( 4 – 6 ), has yielded major breakthroughs in protein design and engineering ( 2, 3, 713 ). These methods have led to great advances in the engineering of new proteins with individual functionality, such as catalytic activity ( 811 ), small molecule binding ( 8, 9 ), or spike protein capping ( 8 ). However, the design of proteins with emergent functions of core relevance to life, such as spatiotemporal self-organization upon energy dissipation, is still in its infancy.…”
Section: Mainmentioning
confidence: 99%
“…The interplay between predictive and generative AI models provide an informed and efficient sequence design by predicting the outcomes of different peptide/protein sequences. Many comprehensive reviews have referenced the successful applications of ML-guided sequence design to antimicrobial peptides (AMPs) [1][2][3][4] , protein binders 5 , antigen-specific monoclonal antibodies [6][7][8][9] , protein families [10][11][12][13][14][15] and enzymes [16][17][18][19][20] . The field of ML-guided peptide/protein sequence design is snowballing; the reader is encouraged to consult these two GitHub repositories 21,22 to stay informed.…”
Section: Introductionmentioning
confidence: 99%