2022
DOI: 10.1101/2022.04.12.487986
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Proximal Exploration for Model-guided Protein Sequence Design

Abstract: Designing protein sequences with a particular biological function is a long-lasting challenge for protein engineering. Recent advances in machine-learning-guided approaches focus on building a surrogate sequence-function model to reduce the demand for expensive in-lab experiments. In this paper, we study the exploration mechanism of model-guided sequence design. We leverage a natural property of protein fitness landscape that a concise set of mutations upon the wild-type sequence are usually sufficient to enha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
51
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(53 citation statements)
references
References 68 publications
2
51
0
Order By: Relevance
“…We show that models can quantitatively predict binding affinities of unseen antibody variants with high accuracy, enabling virtual screenings and augmenting the accessible sequence space by orders of magnitude. In this sense, the trained learner can serve as an oracle, assigning functional annotations from just sequence [17,18]. We confirm predictions and consequent designs in the lab, with a much higher success rate than would be attained with traditional screening.…”
Section: Introductionsupporting
confidence: 71%
“…We show that models can quantitatively predict binding affinities of unseen antibody variants with high accuracy, enabling virtual screenings and augmenting the accessible sequence space by orders of magnitude. In this sense, the trained learner can serve as an oracle, assigning functional annotations from just sequence [17,18]. We confirm predictions and consequent designs in the lab, with a much higher success rate than would be attained with traditional screening.…”
Section: Introductionsupporting
confidence: 71%
“…As the relationship between the sequence and its fitness can be influenced by various factors, Gaussian process is a popular choice [18] , which helps account for the uncertainty in predicting the effects of mutations on protein function to guide the directed evolution of proteins with desired properties. Furthermore, the trained deep neural network can be used to screen large numbers of designed sequences in silico, without the need for wet-lab experiments [19,20,21].…”
Section: Related Workmentioning
confidence: 99%
“…Here, S represents the string of amino acids, and L represents the desired length of the sequence. Let's define the protein fitness mapping function as f , which is a black box function that can be evaluated through laboratory experiments [19]. The goal is to maximise f : S L → R by modifying the starting sequence, s 0 , such sequence should occur in nature.…”
Section: Problem Backgroundmentioning
confidence: 99%
See 1 more Smart Citation
“…Directed evolution can be combined with computational approach. [25][26][27] Cheng et al [28] proposed an efficient, experimental design-oriented closed-loop optimization framework for protein directed evolution, which employs a combination of novel low-dimensional protein encoding strategy and Bayesian optimization enhanced with search space prescreening via outlier detection. Harteveld et al [29] proposed a framework based to automatically assemble structural templates with nativelike features.…”
Section: A Related Workmentioning
confidence: 99%