Codon optimization with deep learning to enhance protein expression

Fu, Hongguang; Liang, Yongxin; Zhong, Xianqiong; Pan, Zhiling; Huang, Lei; Zhang, Hailin; Xu, Yang; Zhou, Wei; Liu, Zhong

doi:10.1038/s41598-020-74091-z

Cited by 124 publications

(65 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…New and improved methodologies will continue to be explored to optimize the stability and translation efficiency of mRNA and the delivery of LNP-mRNA complexes. Novel approaches, including deep learning and genome-wide screening method to identify the optimal codon usage and UTR design of mRNA are already being tested empirically [72,115]. Recent studies have screened a library of the total mRNA containing 5'-UTR using computational and empirical analyses and determined the optimal 5'-UTR for the maximum RNA stability and translation efficiency in vitro and in vivo [116,117].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

mRNA vaccines for COVID-19: what, why and how

Park¹,

Lagniton²,

Liu³

et al. 2021

Int. J. Biol. Sci.

248

200

View full text Add to dashboard Cite

The Coronavirus disease-19 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus -2 (SARS-CoV-2), has impacted human lives in the most profound ways with millions of infections and deaths. Scientists and pharmaceutical companies have been in race to produce vaccines against SARS-CoV-2. Vaccine generation usually demands years of developing and testing for efficacy and safety. However, it only took less than one year to generate two mRNA vaccines from their development to deployment. The rapid production time, cost-effectiveness, versatility in vaccine design, and clinically proven ability to induce cellular and humoral immune response have crowned mRNA vaccines with spotlights as most promising vaccine candidates in the fight against the pandemic. In this review, we discuss the general principles of mRNA vaccine design and working mechanisms of the vaccines, and provide an up-to-date summary of pre-clinical and clinical trials on seven anti-COVID-19 mRNA candidate vaccines, with the focus on the two mRNA vaccines already licensed for vaccination. In addition, we highlight the key strategies in designing mRNA vaccines to maximize the expression of immunogens and avoid intrinsic innate immune response. We also provide some perspective for future vaccine development against COVID-19 and other pathogens.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Two additional codon optimization methods involve the use of the codons with human bias and the maximum adaptation index [69,70]. Other bioinformatics approaches can be explored to further enhance the stability of mRNA, e.g., via design of the secondary structures and prediction of the expression level based on deep learning [71,72].…”

Section: Codon Optimizationmentioning

confidence: 99%

mRNA vaccines for COVID-19: what, why and how

Park¹,

Lagniton²,

Liu³

et al. 2021

Int. J. Biol. Sci.

248

200

View full text Add to dashboard Cite

show abstract

“…12,16,20,23–25 However, other methods have been proposed. 18,26,27 While classical approaches such as GAs can be highly performant, the fraction of solution space that is sampled in a fixed number of iterations decreases exponentially as the polypeptide chain length grows. Thorough sampling of the solutions space is therefore often intractable with biologically relevant use-cases.…”

Section: Introductionmentioning

confidence: 99%

mRNA codon optimization on quantum computers

Fox

Branson

Walker

2021

Preprint

View full text Add to dashboard Cite

Reverse translation of polypeptide sequences to expressible mRNA constructs is a NP-hard combinatorial optimization problem. Each amino acid in the protein sequence can be represented by as many as six codons, and the process of selecting the combination that maximizes probability of expression is termed codon optimization. This work investigates the potential impact of leveraging quantum computing technology for codon optimization. An adiabatic quantum computer (AQC) is compared to a standard genetic algorithm (GA) programmed with the same objective function. The AQC is found to be competitive in identifying optimal solutions and future generations of AQCs may be able to outperform classical GAs. The utility of gate-based systems is also evaluated using a simulator resulting in the finding that while current generations of devices lack the hardware requirements, in terms of both qubit count and connectivity, to solve realistic problems, future generation devices may be highly efficient.

show abstract

“…Multiple factors are known to influence the outcome of recombinant protein production. These include codon usage of the gene (Fu et al, 2020), expression vector and plasmid design (Rosano and Germán, 2019), host strain design and optimizations, growth media and cultivation conditions, as well as protein recovery method (Zhang et al, 2020). In addition, some proteins can be toxic to the host or aggregate in inclusion bodies (Rosano and Germán, 2019).…”

Section: Introductionmentioning

confidence: 99%

“…However, due to the variation in natural proteins, this is not always possible. To handle the variations, multiple growth media and cultivation conditions can be explored, as can optimizations of the genes codon usage to better match the codon usage of the recombinant host (Fu et al, 2020). The above factors and variability in the expression system are expected to have significant impact on the protein expression outcome, and strategies for selecting genes more like to express are needed.…”

Section: Introductionmentioning

confidence: 99%

Deep protein representations enable recombinant protein expression prediction

Martiny¹,

Armenteros

Salomon

et al. 2021

Preprint

View full text Add to dashboard Cite

A crucial process in the production of industrial enzymes is recombinant gene expression, which aims to induce enzyme overexpression of the genes in a host microbe. Current approaches for securing overexpression rely on molecular tools such as adjusting the recombinant expression vector, adjusting cultivation conditions, or performing codon optimizations. However, such strategies are time-consuming, and an alternative strategy would be to select genes for better compatibility with the recombinant host. Several methods for predicting expressibility and solubility are available; however, they are all optimized for the expression host Escherichia coli. We show that these tools are not suited for predicting expression potential in the industrially important host Bacillus subtilis. Instead, we build a B. subtilis-specific machine learning model for expressibility prediction. Given millions of unlabelled proteins, and a small labelled dataset, we can successfully train such a predictive model. The unlabelled proteins provide a performance boost relative to using amino acid frequencies of the labelled proteins as input. On average, we obtain a modest performance of 0.64 area-under-the-curve (AUC) and 0.2 Matthews correlation coeffcient (MCC). However, we find that this is sufficient to be useful for prioritization of expression candidates. Moreover, the predicted class probabilities are correlated with expression levels. A number of features related to protein expression, including base frequencies and solubility, are captured by the model.

show abstract

Codon optimization with deep learning to enhance protein expression

Cited by 124 publications

References 45 publications

mRNA vaccines for COVID-19: what, why and how

mRNA vaccines for COVID-19: what, why and how

mRNA codon optimization on quantum computers

Deep protein representations enable recombinant protein expression prediction

Contact Info

Product

Resources

About