2013
DOI: 10.1186/1471-2105-14-261
|View full text |Cite
|
Sign up to set email alerts
|

A balanced iterative random forest for gene selection from microarray data

Abstract: BackgroundThe wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging to each class is not the same. With this kind of dataset, biologists need to identify a small number of informative genes that can be used as biomarkers for a disease.ResultsThis paper introduces a Balanced Iterative Random Forest (BIRF) algorithm to select the mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
47
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(47 citation statements)
references
References 22 publications
0
47
0
Order By: Relevance
“…Since there were more non-ASB events than ASB events, non-ASB events were randomly downsampled to balance the training data set for each tree building process following the balanced random forest approach (34,35). We used a 5-fold cross-validation approach to assess the predictive power of the classifiers.…”
Section: Methodsmentioning
confidence: 99%
“…Since there were more non-ASB events than ASB events, non-ASB events were randomly downsampled to balance the training data set for each tree building process following the balanced random forest approach (34,35). We used a 5-fold cross-validation approach to assess the predictive power of the classifiers.…”
Section: Methodsmentioning
confidence: 99%
“…Data from a previous study have shown that the OOB estimate is as accurate as using a test set of the same sample size as the training set11. Although RF was more resistant to over-fitting than a support vector machine or artificial neural network, the validation set was involved in the same process simultaneously and an “early-stop” strategy was applied to prevent “over-fitting”12. The final set of mRNAs with smallest estimated OOB was identified as important features of TIF.…”
Section: Methodsmentioning
confidence: 99%
“…Random forest (RF) methods, constructed from decision tree predictors, represent one of the most prevalent supervised machine learning methods, which was first introduced by Breiman in 200111. RF methods return measures of variable importance and have superior performance with respect to the problems that microarray data bring, making it well suited for microarray analysis12. An empirical study by Archer et al .…”
mentioning
confidence: 99%
“…In this process of revisiting some of the key aspects of RF, variable importance measures were also reevaluated focusing in their performance on extreme cases of recognized limited performance of RF, such in highly unbalanced datasets [61]. With this aim, a more robust variable importance indicator was proposed, now using a variant at the variable permutation step based in the area under the curve (AUC) [30].…”
Section: Application-driven Improvements To Rf Schemementioning
confidence: 99%