2021
DOI: 10.1101/2021.11.18.468948
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accuracy and data efficiency in deep learning models of protein expression

Abstract: Recent progress in laboratory automation has enabled rapid and large-scale characterization of strains engineered to express heterologous proteins, paving the way for the use of machine learning to optimize production phenotypes. The ability to predict protein expression from DNA sequence promises to deliver large efficiency gains and reduced costs for strain design. Yet it remains unclear which models are best suited for this task or what is the size of training data required for accurate prediction. Here we … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 54 publications
0
3
0
Order By: Relevance
“…It will also be helpful to understand the trade-offs between performance and complexity for specific applications, and studies aimed at exploring this are likely to be valuable. For instance, Nikolados et al 103 compared models of increasing complexity to compare their ability to predict protein expression from DNA sequence. Finally, the amount of data available also places important constraints on whether deep learning approaches are appropriate, as deep models require large training sets.…”
Section: Discussionmentioning
confidence: 99%
“…It will also be helpful to understand the trade-offs between performance and complexity for specific applications, and studies aimed at exploring this are likely to be valuable. For instance, Nikolados et al 103 compared models of increasing complexity to compare their ability to predict protein expression from DNA sequence. Finally, the amount of data available also places important constraints on whether deep learning approaches are appropriate, as deep models require large training sets.…”
Section: Discussionmentioning
confidence: 99%
“…The dataset employed in this study was originally reported in [22]. The data reorganized in a form suitable for machine learning analyses can be found in the Zenodo [50].…”
Section: Data Availabilitymentioning
confidence: 99%
“…Metabolic models of a whole range of biological systems have been particularly useful (Gudmundsson and Nogales, 2021), and they hold a promise for guiding implementation of a suite of biotechnological interventions of the sort advocated below (García‐Jiménez et al ., 2021). Finally, of considerable interest to microbial biotechnology is the recent development of ML platforms to support roadmaps for engineering heterologous gene expression workflows (Reis and Salis, 2020; Nikolados et al ., 2021). While these approaches currently fail to provide mechanistic insights, they may solve practical problems that are not comprehensible with the principles known at the present time.…”
Section: A Decade and A Half Of Easy Dna Sequencing—and Its Unexpecte...mentioning
confidence: 99%