2022
DOI: 10.1038/s42256-022-00532-1
|View full text |Cite
|
Sign up to set email alerts
|

Transformer-based protein generation with regularized latent space optimization

Abstract: The development of powerful natural language models has improved the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder, which features a highly structured latent space that is trained to joint… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(53 citation statements)
references
References 27 publications
0
53
0
Order By: Relevance
“…Henderson and Fehr employed VAEs as an information bottleneck regularizer for transformer embeddings and used this model to embed and generate text within a nonparametric space of mixture distributions [55]. Our work is most closely related to the recent work of Castro et al, who introduced the Regularized Latent Space Optimization (ReLSO) approach for data-driven protein engineering [40]. The jointly trained autoencoder (JT-AE) architecture underpinning this approach comprises a transformer encoder, low-dimensional projection into a latent space bottleneck, 1D convolutional neural network (CNN) decoder, and fully-connected network to predict function from the latent space embedding.…”
Section: Resultsmentioning
confidence: 99%
“…Henderson and Fehr employed VAEs as an information bottleneck regularizer for transformer embeddings and used this model to embed and generate text within a nonparametric space of mixture distributions [55]. Our work is most closely related to the recent work of Castro et al, who introduced the Regularized Latent Space Optimization (ReLSO) approach for data-driven protein engineering [40]. The jointly trained autoencoder (JT-AE) architecture underpinning this approach comprises a transformer encoder, low-dimensional projection into a latent space bottleneck, 1D convolutional neural network (CNN) decoder, and fully-connected network to predict function from the latent space embedding.…”
Section: Resultsmentioning
confidence: 99%
“…Another model with generative ability is EVE [100] , a VAE used to predict the pathogenicity of protein variants. A particularly interesting application of language models came with ReLSO [101] , which used a transformer autoencoder paired with function prediction, inferring protein functionality from sequence embeddings. This model can also be used to generate new sequences by optimizing the latent space with gradient ascent.…”
Section: The Deep Learning Era Of Protein Sequence and Structure Gene...mentioning
confidence: 99%
“…Protein sequential arrangement or functional motif decomposition was thought to be like human language, which can be organized to represent certain meanings [ 81 ]. Accordingly, Natural Language Processing (NLP) methods originally used for human language translation were applied to resolving protein folding and de novo design tasks, by extracting features from protein sequences [ 22 , 82 ].…”
Section: De Novo Design Inspired By Highly Accurate Protein Modelingmentioning
confidence: 99%