2021
DOI: 10.1016/j.cels.2021.07.008
|View full text |Cite
|
Sign up to set email alerts
|

Informed training set design enables efficient machine learning-assisted directed protein evolution

Abstract: Due to screening limitations, in directed evolution (DE) of proteins it is rarely feasible to fully evaluate combinatorial mutant libraries made by mutagenesis at multiple sites. Instead, DE often involves a single-step greedy optimization in which the mutation in the highest-fitness variant identified in each round of single-site mutagenesis is fixed. However, because the effects of a mutation can depend on the presence or absence of other mutations, the efficiency and effectiveness of a single-step greedy wa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

8
229
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 148 publications
(237 citation statements)
references
References 61 publications
8
229
0
Order By: Relevance
“…11). This implies that the design of more effective training data should be taken into account when developing ML algorithms to assist protein engineering, especially when the experimental test budget is limited 62 .…”
Section: Discussionmentioning
confidence: 99%
“…11). This implies that the design of more effective training data should be taken into account when developing ML algorithms to assist protein engineering, especially when the experimental test budget is limited 62 .…”
Section: Discussionmentioning
confidence: 99%
“…Experimental survey of the fitness landscape of a protein of interest is increasingly used in protein engineering to discover novel sequences with specific functions (Bryant et al, 2021b; Romero and Arnold, 2009; Russ et al, 2020; Wittmann et al, 2021). While this approach remains challenging for proteins with a function that cannot be easily ascertained in a high-throughput manner (Romero and Arnold, 2009), it is likely to be more widely used in the future due to technological advances of experimental (Romero and Arnold, 2009) and analytical (Wittmann et al, 2021; Wu et al, 2019) tools. Our description of heterogeneity of fitness peaks of orthologous GFPs suggests some practical considerations for such surveys of other proteins.…”
Section: Discussionmentioning
confidence: 99%
“…Originally, the fitness landscape was introduced to describe the relationship between fitness and the entire genome (de Visser and Krug, 2014; Wright, 1932). Over time, the usefulness of the concept of the fitness landscape led to the adaptation of this term to describe the relationship between protein function and its protein-coding gene sequence (Biswas et al, 2021; Ogden et al, 2019; Romero and Arnold, 2009; Wittmann et al, 2021; Zheng et al, 2020). Absolute knowledge of the fitness landscape would reveal the phenotypes conferred by any arbitrary genotype (de Visser and Krug, 2014; Ferretti et al, 2018; Fragata et al, 2019), with immense and obvious practical implications (Alley et al, 2019; Bryant et al, 2021a; Hirabayashi and Arai, 2019; Kemble et al, 2019; Wrenbeck et al, 2017; Wu et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, many artificial intelligence techniques have been developed, and they have been widely applied to various relationship analyses and used for the repositioning of FDA-approved drugs [31][32][33]. Therefore, cooperative work with artificial intelligence, particularly in the initial screening for the relationship and in the analysis of the structurefunction relationship, should be pursued to elaborate the directed evolution approach to CYP2C8 in the future.…”
Section: Discussionmentioning
confidence: 99%