2021
DOI: 10.1101/2021.05.13.443426
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep protein representations enable recombinant protein expression prediction

Abstract: A crucial process in the production of industrial enzymes is recombinant gene expression, which aims to induce enzyme overexpression of the genes in a host microbe. Current approaches for securing overexpression rely on molecular tools such as adjusting the recombinant expression vector, adjusting cultivation conditions, or performing codon optimizations. However, such strategies are time-consuming, and an alternative strategy would be to select genes for better compatibility with the recombinant host. Several… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 40 publications
(63 reference statements)
0
3
0
Order By: Relevance
“…The objective of this core is to calculate the embedding representations for protein sequences. For protein sequence encoding/embedding, recent studies have shown the superior performance of deep learning-based methods compared to traditional methods [ 31 , 32 ]. Accordingly, we only compared one-hot encoding to show the difference between these 2 kinds of embedding in this study.…”
Section: Methodsmentioning
confidence: 99%
“…The objective of this core is to calculate the embedding representations for protein sequences. For protein sequence encoding/embedding, recent studies have shown the superior performance of deep learning-based methods compared to traditional methods [ 31 , 32 ]. Accordingly, we only compared one-hot encoding to show the difference between these 2 kinds of embedding in this study.…”
Section: Methodsmentioning
confidence: 99%
“…The objective of this core is to calculate the embedding representations for protein sequences. For protein sequence encoding/embedding, recent studies have shown the superior performance of deep learning-based methods compared to traditional methods [23,24]. Accordingly, we only compared one-hot encoding to show the difference between these two kinds of embedding in this study.…”
Section: Proposed Frameworkmentioning
confidence: 99%
“…By using the masked language model objective it is able to build a context around each position and learns to “attend” or “focus” on amino acids and peptides that are relevant in the given context. These language models have been found to encode contact maps, taxonomy, and biophysical characteristics in their distributed representations [Rives et al, 2021, Rao et al, 2021, 2020, Elnaggar et al, 2020, Vig et al, 2020, Brandes et al, 2021, Martiny et al, 2021]. In this study, we use a protein language model to predict two objectives, solubility and practical usability for purification of proteins in E. coli , and obtain state-of-the-art performance.…”
Section: Introductionmentioning
confidence: 99%