2016
DOI: 10.1007/s11634-016-0276-4
|View full text |Cite
|
Sign up to set email alerts
|

A computationally fast variable importance test for random forests for high-dimensional data

Abstract: Random forests are a commonly used tool for classification with high-dimensional data as well as for ranking candidate predictors based on the so-called variable importance measures. There are different importance measures for ranking predictor variables, the two most common measures are the Gini importance and the permutation importance. The latter has been found to be more reliable than the Gini importance. It is computed from the change in prediction accuracy when removing any association between the respon… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
85
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 152 publications
(85 citation statements)
references
References 37 publications
0
85
0
Order By: Relevance
“…The permutation variable importance measure used herein quantifies the loss in skill, the algorithm's ability to predict composite score based on structural indicator variables, by randomly permuting the values of a single predictor variable and comparing that to the unpermuted version. We use this metric to filter structural indicator variables not useful in understanding how structure impacts composite skill, i.e., those with negative values are excluded from further analysis (Janitza et al 2018). Using variable importance scores as a filter and then calculating gain for topmost splitting variables allows us to identify which structural attributes are most relevant to divergence and to quantify this dependence.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…The permutation variable importance measure used herein quantifies the loss in skill, the algorithm's ability to predict composite score based on structural indicator variables, by randomly permuting the values of a single predictor variable and comparing that to the unpermuted version. We use this metric to filter structural indicator variables not useful in understanding how structure impacts composite skill, i.e., those with negative values are excluded from further analysis (Janitza et al 2018). Using variable importance scores as a filter and then calculating gain for topmost splitting variables allows us to identify which structural attributes are most relevant to divergence and to quantify this dependence.…”
Section: Simulation Resultsmentioning
confidence: 99%
“…In order to finally determine the relevant variables based on their permutation importance, several variable selection strategies have been proposed. In general, the Boruta method 43 and the Vita algorithm 44 can be recommended as they have shown to be well balanced in terms of sensitivity and specificity. 45 In the real-world data application, we are able to clarify which of the variables brought up by the original MOB might be truly predictive.…”
Section: Discussionmentioning
confidence: 99%
“…Random forests are a machine learning technique, which can be used to find the variables – here proteins – that allow to predict which datasets or samples are similar (and which ones are not; Degenhardt et al , 2019). For variable importance calculation, we employed the method from (Janitza et al , 2018) as implemented in the ranger package. This method uses a heuristic approach, where a null distribution for p-value calculation is generated based on variables with importance scores of zero or negative importance scores.…”
Section: Methodsmentioning
confidence: 99%