Motivation Recombinant protein production is a widely used technique in the biotechnology and biomedical industries, yet only a quarter of target proteins are soluble and can therefore be purified. Results We have discovered that global structural flexibility, which can be modeled by normalised B-factors, accurately predicts the solubility of 12,216 recombinant proteins expressed in Escherichia coli. We have optimised these B-factors, and derived a new set of values for solubility scoring that further improves prediction accuracy. We call this new predictor the ‘Solubility-Weighted Index’ (SWI). Importantly, SWI outperforms many existing protein solubility prediction tools. Furthermore, we have developed ‘SoDoPE’ (Soluble Domain for Protein Expression), a web interface that allows users to choose a protein region of interest for predicting and maximising both protein expression and solubility. Availability The SoDoPE web server and source code are freely available at https://tisigner.com/sodope and https://github.com/Gardner-BinfLab/TISIGNER-ReactJS, respectively. The code and data for reproducing our analysis can be found at https://github.com/Gardner-BinfLab/SoDoPE_paper_2020. Supplementary information Supplementary data are available at Bioinformatics online.
Experiments that are planned using accurate prediction algorithms will mitigate failures in recombinant protein production. We have developed TISIGNER (https://tisigner.com) with the aim of addressing technical challenges to recombinant protein production. We offer three web services, TIsigner (Translation Initiation coding region designer), SoDoPE (Soluble Domain for Protein Expression) and Razor, which are specialised in synonymous optimisation of recombinant protein expression, solubility and signal peptide analysis, respectively. Importantly, TIsigner, SoDoPE and Razor are linked, which allows users to switch between the tools when optimising genes of interest.
Recombinant protein production is a key process in generating proteins of interest in the pharmaceutical industry and biomedical research. However, about 50% of recombinant proteins fail to be expressed in a variety of host cells. Here we show that the accessibility of translation initiation sites modelled using the mRNA base-unpairing across the Boltzmann’s ensemble significantly outperforms alternative features. This approach accurately predicts the successes or failures of expression experiments, which utilised Escherichia coli cells to express 11,430 recombinant proteins from over 189 diverse species. On this basis, we develop TIsigner that uses simulated annealing to modify up to the first nine codons of mRNAs with synonymous substitutions. We show that accessibility captures the key propensity beyond the target region (initiation sites in this case), as a modest number of synonymous changes is sufficient to tune the recombinant protein expression levels. We build a stochastic simulation model and show that higher accessibility leads to higher protein production and slower cell growth, supporting the idea of protein cost, where cell growth is constrained by protein circuits during over expression.
149/150 words) Recombinant protein production in microbial systems is well-established, yet half of these experiments have failed in the expression phase. Failures are expected for 'difficult-to-express' proteins, but for others, codon bias, mRNA folding, avoidance, and G+C content have been suggested to explain observed levels of protein expression. However, determining which of these is the strongest predictor is still an active area of research. We used an ensemble average of energy model for RNA to show that the accessibility of translation initiation sites outperforms other features in predicting the outcomes of 11,430 experiments of recombinant protein production in Escherichia coli . We developed TIsigner and showed that synonymous codon changes within the first nine codons are sufficient to improve the accessibility of translation initiation sites. Our software produces scores for both input and optimised sequences, so that success/failure can be predicted and prevented by PCR cloning of optimised sequences.
Motivation: Signal peptides are responsible for protein transport and secretion and are ubiquitous to all forms of life. The annotation of signal peptides is important for understanding protein translocation and toxin secretion, optimising recombinant protein expression, as well as for disease diagnosis and metagenomics. Results: Here we explore the features of these signal sequences across eukaryotes. We find that different kingdoms have their characteristic distributions of signal peptide residues. Additionally, the signal peptides of secretory toxins have common features across kingdoms. We leverage these subtleties to build Razor, a simple yet powerful tool for annotating signal peptides, which additionally predicts toxin- and fungal-specific signal peptides based on the first 23 N-terminal residues. Finally, we demonstrate the usability of Razor by scanning all reviewed sequences from UniProt. Indeed, Razor is able to identify toxins using their signal peptide sequences only. Strikingly, we discover that many defensive proteins across kingdoms harbour a toxin-like signal peptide; some of these defensive proteins have emerged through convergent evolution, e.g. defensin and defensin-like protein families, and phospholipase families. Availability and implementation: Razor is available as a web application (https://tisigner.com/razor) and a command-line tool (https://github.com/Gardner-BinfLab/Razor).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.