2011
DOI: 10.1109/tcbb.2010.96
|View full text |Cite
|
Sign up to set email alerts
|

Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification

Abstract: Abstract-In biomedical data, the imbalanced data problem occurs frequently and causes poor prediction performance for minority classes. It is because the trained classifiers are mostly derived from the majority class. In this paper, we describe an ensemble learning method combined with active example selection to resolve the imbalanced data problem. Our method consists of three key components: 1) an active example selection algorithm to choose informative examples for training the classifier, 2) an ensemble le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0
1

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(15 citation statements)
references
References 20 publications
0
14
0
1
Order By: Relevance
“…In SVIS, the authors decreased the size of the classifier committee. Ensemble classifiers were utilized in other algorithms to tackle real-world problems, e.g., selecting refined training sets from biomedical data (Oh et al 2011). …”
Section: Neighborhood Analysis Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In SVIS, the authors decreased the size of the classifier committee. Ensemble classifiers were utilized in other algorithms to tackle real-world problems, e.g., selecting refined training sets from biomedical data (Oh et al 2011). …”
Section: Neighborhood Analysis Methodsmentioning
confidence: 99%
“…Therefore, applying an appropriate approach for selecting desired training sets is inevitable. Oh et al (2011) investigated their SVM training set selection using such imbalanced sets for various diseases (leukemia, diabetes, Parkinson's disease, hepatitis, breast cancer and cardiac diseases). These datasets included up to 800 vectors (Diabetes dataset), and the number of features was up to almost 7200 in the Leukemia dataset.…”
Section: Datasets and Practical Applicationsmentioning
confidence: 99%
“…The reason we choose the ensemble learning method is because it is believed to perform well for imbalanced data [29, 30, 32]. We employ an ensemble of 1000 deep trees that have minimal leaf size of 5 with a learning rate 0.1 in RUBoost learning to attain a high ensemble accuracy.…”
Section: Resultsmentioning
confidence: 99%
“…The popular method to solve imbalanced data problem is random re-sampling technique which balances the number of training examples among classes [6]. Common random resampling techniques include the random over sampling (ROS) and the random under sampling (RUS).…”
Section: Related Workmentioning
confidence: 99%
“…In many cases, the user is more interested in minority class. Thus, addressing and solving imbalanced data problem is very critical for improving classification performance [6] Random forest [7] is an ensemble classifier that consists of many decision trees and outputs the class that is the majority of the classes of all the individual trees. The method combines bootstrap and the node randomly split technical to train multiple trees, and the classification result is decided by majority voting.…”
Section: Introductionmentioning
confidence: 99%