2014
DOI: 10.2174/09298665113206660114
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Enzyme Subclasses by Using Random Forest with Multicharacteristic Parameters

Abstract: In order to predict enzyme subclasses, this paper builds a new enzyme database in term of previous ideas and methods. Based on protein sequence, by selecting increment of diversity value, low-frequency of power spectral density, matrix scoring values and motif frequency as characteristic parameters to describe the sequence information, a Random Forest algorithm for predicting enzyme subclass is proposed. Using the Jack-knife test, the overall success rate identifying the 18 subclasses of oxidoreductases, the 8… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…Considering the local conservation of fold sequences, the sequence of each protein fold was divided into n segments, and in each segment, the occurrence frequencies of 20 amino acid residues in the protein sequences were extracted as a parameter, as previously described (Chen and Li, 2007, Wang et al, 2014). Thus, the initial parameter of each sequence was converted into a 20*n-dimensional vector that was inputted into the ID algorithm for classification, and an improved result was obtained.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Considering the local conservation of fold sequences, the sequence of each protein fold was divided into n segments, and in each segment, the occurrence frequencies of 20 amino acid residues in the protein sequences were extracted as a parameter, as previously described (Chen and Li, 2007, Wang et al, 2014). Thus, the initial parameter of each sequence was converted into a 20*n-dimensional vector that was inputted into the ID algorithm for classification, and an improved result was obtained.…”
Section: Methodsmentioning
confidence: 99%
“…As feature parameters, motif information has been successfully applied for the prediction of superfamilies, protein folds, etc. (Ben-Hur and Brutlag, 2003, Liu et al, 2012, Wang et al, 2014).…”
Section: Methodsmentioning
confidence: 99%
“…For example, some motifs are related to DNA-binding sites and enzyme catalytic sites [29]. As feature parameters, motif information has been successfully applied for the prediction of super family, protein folds, and so forth [27, 28, 30]. Two kinds of motifs were used in this paper: one with biological functions obtained by searching the existing functional motif dataset PROSITE [31] and the other with statistically significant motifs searched by MEME (http://meme.nbcr.net/meme/cgi-bin/meme.cgi).…”
Section: Methodsmentioning
confidence: 99%