2016
DOI: 10.1186/s12859-016-0900-5
|View full text |Cite
|
Sign up to set email alerts
|

An experimental study of the intrinsic stability of random forest variable importance measures

Abstract: BackgroundThe stability of Variable Importance Measures (VIMs) based on random forest has recently received increased attention. Despite the extensive attention on traditional stability of data perturbations or parameter variations, few studies include influences coming from the intrinsic randomness in generating VIMs, i.e. bagging, randomization and permutation. To address these influences, in this paper we introduce a new concept of intrinsic stability of VIMs, which is defined as the self-consistence among … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
89
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 152 publications
(90 citation statements)
references
References 38 publications
0
89
0
1
Order By: Relevance
“…We used 3,000 trees for each RF model because the stability of variable importance measures increases with the number of trees (Wang et al. ).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We used 3,000 trees for each RF model because the stability of variable importance measures increases with the number of trees (Wang et al. ).…”
Section: Methodsmentioning
confidence: 99%
“…The change in model performance when variables are permuted is interpreted as the relative contribution of that variable to the model. We used 3,000 trees for each RF model because the stability of variable importance measures increases with the number of trees (Wang et al 2016).…”
Section: Model Developmentmentioning
confidence: 99%
“…Patients were classified into two groups according to progression disease (unfavorable in GSE49711, n=91, and stage 4 in GSE45480, n=214) or regression disease (favorable in GSE49711, n=181, and stage 4s in GSE45480, n=78). Hereafter, the random forest model (Wang et al, 2016) was adopted to select cell types for discriminating the two distinct groups, then mean decrease Gini (MDG) and mean decrease accuracy (MDA) was used as the parameters to estimate the importance. Two in three of the samples were randomly selected as training, and the rest were as test to evaluate the importance and the out-of-bag (OOB) value, this step was repeated for 100 times, an average of MDG, MDA, and OOB was as the final result.…”
Section: Random Forest Feature Selection and Risk Score Modelmentioning
confidence: 99%
“…The bagging can improve the accuracy rate of the algorithm because the perturbation in learning set could cause a change in predictor construction. Research on the stability of variable impact measurements based on random forest algorithm received high attention in these days [10]. In a recent study, the variable impact measurement is divided into two categories: Mean Decrease Impurity (MDI) and Mean Decrease Accuracy (MDA).…”
Section: Random Forestmentioning
confidence: 99%