2014
DOI: 10.1371/journal.pone.0086703
|View full text |Cite
|
Sign up to set email alerts
|

Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes

Abstract: Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predict… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
143
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 180 publications
(143 citation statements)
references
References 60 publications
0
143
0
Order By: Relevance
“…As reported in [28], the PDB1075 dataset includes the highest number of protein sequences with low similarity, which is desirous for model evaluation. The other benchmark dataset, called PDB186, was recently constructed by Lou et al [30], and contains 93 actual DNA-binding and 93 non-DNA-binding proteins also collected from PDB. The PDB186 dataset provides an independent test for validating the predictors.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…As reported in [28], the PDB1075 dataset includes the highest number of protein sequences with low similarity, which is desirous for model evaluation. The other benchmark dataset, called PDB186, was recently constructed by Lou et al [30], and contains 93 actual DNA-binding and 93 non-DNA-binding proteins also collected from PDB. The PDB186 dataset provides an independent test for validating the predictors.…”
Section: Datasetsmentioning
confidence: 99%
“…Given the importance of DNA-binding proteins, methods for identifying members of this protein class are highly desired. In early research, DNA-binding proteins were determined by experimental approaches; typically by filter binding assays, genetic analysis, chromatin immune precipitation on microarrays, and X-ray crystallography [30]. However, experimental methods are costly in terms of time and ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R I P T resources [21].…”
Section: Introductionmentioning
confidence: 99%
“…In particular, a bunch of recent work has reported its high efficiency when applied into various fields, such as protein structural class prediction [52], DNA-binding protein prediction [59], as well as cytokine-receptor interaction prediction [28], and cell penetrating peptide prediction [60], etc.…”
Section: Classifier Selectionmentioning
confidence: 99%
“…Machine learning algorithms were employed to construct models to predict DNA-binding proteins and produced effective performances [49,1119]. Interestingly, the support vector machine (SVM) algorithm has been used frequently to predict DNA-binding proteins [46,8,1216]. Cai and Lin first applied the SVM algorithm for DNA-binding protein prediction using a protein’s amino acid composition and a limited range of correlations of hydrophobicity and solvent-accessible surface areas as input features [4].…”
Section: Introductionmentioning
confidence: 99%
“…Random forest (RF) alorgithm, which is a useful machine learning classifier, was aslo used to prdict DNA-binding proteins. Lou et al applied the RF algorithm to predict DNA-binding proteins using predicted secondary structure, predicted relative solvent accessibility and position-specific scoring matrix as the primary sequence features[8]. …”
Section: Introductionmentioning
confidence: 99%