2014
DOI: 10.1039/c4ay01240b
|View full text |Cite
|
Sign up to set email alerts
|

Classification of multi-family enzymes by multi-label machine learning and sequence-based descriptors

Abstract: Multi-family enzymes are of great importance in life, disease and other domains. However, in terms of the classification of enzymes, the information of multi-family enzymes is always removed from the dataset to account for the limitation of traditional single-label prediction methods. In order to predict multiple classes of multi-family enzymes, we adopted two multi-label learning algorithms, namely RAkEL-RF and MLKNN, and two types of protein descriptors, namely CTD and PseAAC, to generate four predictors, RA… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 57 publications
0
8
0
Order By: Relevance
“…The results showed that the ability to predict the enzyme in the subclass of oxidoreductases is somewhat poor, with a success rate by the jackknife test of 86.7%. In 2014, Wang et al [34] adopted several different methods in feature extraction and classification of feature extraction methods to match one classification method and compared the prediction results of four prediction models. They found that the best prediction model is the combination of RAkEL-RF and CTD, with which the highest accuracy with 10-fold cross validation of the training data reached 97.99% and the test data reached 97.57%.…”
Section: B Comment On Published Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The results showed that the ability to predict the enzyme in the subclass of oxidoreductases is somewhat poor, with a success rate by the jackknife test of 86.7%. In 2014, Wang et al [34] adopted several different methods in feature extraction and classification of feature extraction methods to match one classification method and compared the prediction results of four prediction models. They found that the best prediction model is the combination of RAkEL-RF and CTD, with which the highest accuracy with 10-fold cross validation of the training data reached 97.99% and the test data reached 97.57%.…”
Section: B Comment On Published Resultsmentioning
confidence: 99%
“…For instance, Shen et al [33] combined functional domain (FunD) and pseudo position-specific scoring matrix (PsePSSM) to extract features in 2009. Wang et al [34] combined composition, transition and distribution (CTD) and pseudo-amino acid composition (PseAAC) to extract features and classify sequences with the combination of the methods of random-k-label-random forest (RAkEL-RF) and multi-label KNN (MLKNN) in 2014. In 2019, Ryu et al [35] used DeepEC, consisting of three different convolutional neural network (CNN) structures in enzyme classification.…”
Section: Introductionmentioning
confidence: 99%
“…These enzymes have been well identified and characterized in plants, bacteria and fungi, and are engaged as an industrially important biocatalyst for the production of bulk and fine chemicals. For example, mandelonitrile could be hydrolyzed to optically pure (R)-(-)-mandelic acid, which is widely used for the production of semisynthetic cephalosporins, penicillins, antitumor agents, and anti-obesity agents (Wang et al 2014). Researchers have revealed that nitrilases play a vital role in various biological processes and plant-microbe interaction, but despite their valuable importance they are relatively less explored for their metabolic functions.…”
Section: Introductionmentioning
confidence: 99%
“…Combination of sequence, structure, and chemical properties of enzymes was also explored by Borgwardt et al (2005) using kernel methods and SVM on the BRENDA database and achieved an accuracy of 93% with six-fold cross-validation on information extracted through protein graph models. Multi-label classification using different methods such as RAkEL-RF and MLKNN (Wang et al, 2014) or MULAN (Zou et al, 2013) was performed on single- and multi-labeled enzymes. In particular, the latter was assessed on enzymes from the Swiss-Prot database based on their amino acid composition and their physico-chemical properties and involved the use of position-specific scoring matrices.…”
Section: Introductionmentioning
confidence: 99%