2014
DOI: 10.1186/1471-2105-15-298
|View full text |Cite
|
Sign up to set email alerts
|

nDNA-prot: identification of DNA-binding proteins based on unbalanced classification

Abstract: BackgroundDNA-binding proteins are vital for the study of cellular processes. In recent genome engineering studies, the identification of proteins with certain functions has become increasingly important and needs to be performed rapidly and efficiently. In previous years, several approaches have been developed to improve the identification of DNA-binding proteins. However, the currently available resources are insufficient to accurately identify these proteins. Because of this, the previous research has been … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
87
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 174 publications
(88 citation statements)
references
References 33 publications
1
87
0
Order By: Relevance
“…Since AdaBoost treats misclassified positive and negative examples equally, it cannot support high performance in imbalanced data. Asymboost is an improved algorithm of AdaBoost in which positive samples had a higher price when they were misclassified (Lin et al, 2013(Lin et al, , 2014Song et al, 2014). However, when positive and negative examples had an equal cost for classification, the algorithm was equivalent to AdaBoost.…”
Section: Uci Datamentioning
confidence: 99%
“…Since AdaBoost treats misclassified positive and negative examples equally, it cannot support high performance in imbalanced data. Asymboost is an improved algorithm of AdaBoost in which positive samples had a higher price when they were misclassified (Lin et al, 2013(Lin et al, , 2014Song et al, 2014). However, when positive and negative examples had an equal cost for classification, the algorithm was equivalent to AdaBoost.…”
Section: Uci Datamentioning
confidence: 99%
“…Recently, Song et al [39] reported that the number of non-DNA-binding proteins in datasets far outweighs the number of DNA-binding proteins. They addressed the data imbalance problem by a novel ensemble classifier (imDC; see [39]).…”
Section: Introductionmentioning
confidence: 98%
“…To further improve DNA-binding protein prediction from the PseAAC vector, they also combined the PseAAC with a physicochemical distance transformation [27]. Besides PseAAC, DNA-binding proteins are represented by other commonly used sequence-based features, such as physicochemical properties [7,27,39,47], amino acid composition [7,42,49], autocross-covariance transformation [11,12], dipeptide composition [12,32], and other hybrid features [25]. Kumar et al [20] newly incorporated A C C E P T E D M A N U S C R I P T evolutionary information into sequence-based methods.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In our future work, we will try other features considering long range sequence information, which is indicted to be useful for the enhancer classification in our current work. We will also try the imbalanced classifiers [52][53][54][55][56]on our dataset, which has been employed CD-HIT and random sampling strategy for the large negative data. Combined with some more sophisticated machine learning models and feature reduction methods [57], we anticipate better performance can be achieved.…”
Section: Resultsmentioning
confidence: 99%