2021
DOI: 10.1093/bioinformatics/btaa1102
|View full text |Cite
|
Sign up to set email alerts
|

SoluProt: prediction of soluble protein expression inEscherichia coli

Abstract: Motivation Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritization of highly soluble proteins. R… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
113
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 111 publications
(113 citation statements)
references
References 47 publications
0
113
0
Order By: Relevance
“…Soluble expression of recombinant proteins in E. coli was predicted by Soluprot 1.0 (40). Transmembrane helices were predicted with TMHMM 2.0 (41).…”
Section: Detection Of Exposed Proteinsmentioning
confidence: 99%
“…Soluble expression of recombinant proteins in E. coli was predicted by Soluprot 1.0 (40). Transmembrane helices were predicted with TMHMM 2.0 (41).…”
Section: Detection Of Exposed Proteinsmentioning
confidence: 99%
“…The North East Structural Consortium (NESG) expressed 9644 proteins in E. coli using a unified production pipeline (Price et al, 2011) and provide integer scores (0-5) for both expression (E) and solubility (S). The proteins are part of the TargetTrack database, but the scores were obtained by Hon et al [2021] from the original authors. We remove sequences that have multiple scores and use the remaining 9272 sequences in two ways.…”
Section: Price Datasetmentioning
confidence: 99%
“…The PaRSnIP (Rawi et al, 2017), DeepSol (Khurana et al, 2018), and SKADE (Raimondi et al, 2020) soluble expression predictors were built using the curated train set and were shown to achieve high scores on the test set. However, it was noticed that these tools generalize poorly (Bhandari et al, 2020, Hon et al, 2021). Raimondi et al [2020] showed that the SKADE model focused mostly on the N- and C- termini and validated that DeepSol did the same using an experiment that involved cropping the starting and ending segments of the sequences.…”
Section: Datamentioning
confidence: 99%
“…Instead of using the trial-and-error approach to get enough protein overexpression, tools that can direct the selection of genes with a higher probability of successful overexpression are desirable. Several tools have been developed for the prediction of soluble overexpression in Escherichia coli, including PROSO II (Smialowski et al, 2012), PaRSnIP (Rawi et al, 2018), DeepSol (Khurana et al, 2018), SKADE (Raimondi et al, 2020), and SoluProt (Hon et al, 2021). In addition, some tools exist for the more specific prediction of solubility, which is an important element in soluble preotein expression.…”
Section: Introductionmentioning
confidence: 99%
“…These include Protein-Sol (Hebditch et al, 2017) and SoDoPe (Bhandari et al, 2020). The mentioned tools use the primary structure as input and calculate various sequence-based features (e.g., hydrophobicity, charge, kmer frequencies, disorder), and they use various machine learning techniques: support vector machines (Agostini et al, 2014), gradient boosting machines (Rawi et al, 2018;Hon et al, 2021), neural networks (Khurana et al, 2018;Raimondi et al, 2020), or other statistical methods (Smialowski et al, 2012;Hebditch et al, 2017;Bhandari et al, 2020). However, all these tools (with the exception of Protein-Sol) have been developed especially with the host Escherichia coli in mind, and it is an open question whether their results can be generalized to other production organisms.…”
Section: Introductionmentioning
confidence: 99%