2022
DOI: 10.1101/2022.08.05.502972
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Now What Sequence? Pre-trained Ensembles for Bayesian Optimization of Protein Sequences

Abstract: Pre-trained models have been transformative in natural language, computer vision, and now protein sequences by enabling accuracy with few training examples. We show how to use pre-trained sequence models in Bayesian optimization to design new protein sequences with minimal labels (i.e., few experiments). Pre-trained models give good predictive accuracy at low data and Bayesian optimization guides the choice of which sequences to test. Pre-trained sequence models also obviate the common requirement of finite po… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
1
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 70 publications
0
10
0
Order By: Relevance
“…Supervised neural models are trained on experimental data to predict a regression output [44][45][46][47]. Of particular note is the structural score given by AlphaFold2, which can be used as an optimization objective to find sequences likely to fold into a desired structure [46,48]. If the model is sufficiently good, model outputs can be used as a synthetic replacement for the experimental target.…”
Section: Synthetic Landscapesmentioning
confidence: 99%
“…Supervised neural models are trained on experimental data to predict a regression output [44][45][46][47]. Of particular note is the structural score given by AlphaFold2, which can be used as an optimization objective to find sequences likely to fold into a desired structure [46,48]. If the model is sufficiently good, model outputs can be used as a synthetic replacement for the experimental target.…”
Section: Synthetic Landscapesmentioning
confidence: 99%
“…Accordingly, we found that our cRBM model better predicted binding affinity changes upon mutation that ProteinMPNN [58], a recently published structure-based autoregressive graph neural network for sequence design. Synergy between evolutionary-based and structure-based protein design approaches is well-established [68][69][70][71] and therefore, although structure-based computational design methods are rapidly improving [58,72,73], we expect that evolutionary information will still prove valuable in the future.…”
Section: Discussionmentioning
confidence: 99%
“…Bayesian optimization is highly effective in striking this balance while being able to also incorporate prior knowledge about the problem, constraints to the search space, and the ability to optimize multiple objectives simultaneously. This has successfully been used in chemical synthesis engineering to reduce the number of experiments required to find the optimal conditions for a reaction and can similarly be applied to protein engineering. ,,,, …”
Section: Machine-learning Methods For Protein Engineeringmentioning
confidence: 99%