2021
DOI: 10.1093/bioinformatics/btab463
|View full text |Cite
|
Sign up to set email alerts
|

EPSOL: sequence-based protein solubility prediction using multidimensional embedding

Abstract: Motivation The heterologous expression of recombinant protein requires host cells, such as Escherichia coli, and the solubility of protein greatly affects the protein yield. A novel and highly accurate solubility predictor that concurrently improves the production yield and minimizes production cost, and that forecasts protein solubility in an E. coli expression system before the actual experimental work is highly sought. Results … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(26 citation statements)
references
References 36 publications
0
25
0
1
Order By: Relevance
“…Bagging is one of the common ensemble learning models ( Dudoit and Fridlyand, 2003 ; Jin et al, 2019 ; Jin et al, 2021 ; Wu and Yu, 2021 ). The ensemble learning model uses a series of weak learners (also called basic models) for learning and integrates the results of each weak learner to obtain a better learning effect than individual learners.…”
Section: Methodsmentioning
confidence: 99%
“…Bagging is one of the common ensemble learning models ( Dudoit and Fridlyand, 2003 ; Jin et al, 2019 ; Jin et al, 2021 ; Wu and Yu, 2021 ). The ensemble learning model uses a series of weak learners (also called basic models) for learning and integrates the results of each weak learner to obtain a better learning effect than individual learners.…”
Section: Methodsmentioning
confidence: 99%
“…AAC tried to count the composition information of peptides. In detail, AAC calculates the frequency of occurrence of each amino acid type ( Wei et al, 2018a ; Liu et al, 2019 ; Ning et al, 2020 ; Yang et al, 2020 ; Zhang and Zou, 2020 ; Wu and Yu, 2021 ). The computation formula of AAC is as follows: where L denotes the length of the peptide, which is the number of characters in the peptide, AAC ( j ) denotes the percentage of amino acid j, N ( j ) denotes the total number of amino acid j .…”
Section: Methodsmentioning
confidence: 99%
“…Low-rank sparse representation models have been applied in many fields ( Cheng et al, 2016 ; Chen et al, 2017 ; Zhang et al, 2017 ; Brbic and Kopriva, 2018 ; Chen et al, 2018 ; Xie et al, 2018 ; Yuanyuan et al, 2018 ; Zeng et al, 2018 ; Ding et al, 2019 ; Shen et al, 2019 ; Zhang et al, 2019 ; Li et al, 2020 ; Wu and Yu, 2021 ), which demonstrate high superiority, particularly in terms of dimensionality reduction and subspace segmentation. Considering existing analysis methods, introduce a low-rank sparse representation model for gene expression profile data analysis, several new methods for feature selection and feature extraction of gene expression profile data based on low-rank sparse representation models are explored, and they are applied to gene expression profile clustering and classification.…”
Section: Application Of Sparse Representation In Bioinformaticsmentioning
confidence: 99%