2010
DOI: 10.1016/j.patrec.2010.03.014
|View full text |Cite
|
Sign up to set email alerts
|

Variable selection using random forests

Abstract: This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The main contribution is twofold: to provide some insights about the behavior of the variable importance index based on random forests and to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

10
1,262
0
21

Year Published

2012
2012
2023
2023

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 2,023 publications
(1,293 citation statements)
references
References 24 publications
10
1,262
0
21
Order By: Relevance
“…In random forest regression, the most widely used index of variable importance is the percentage of increase in mean square error (%IncMSE) following the permutation of a given predictor variable. Higher %IncMSE indicates greater variable importance (Genuer, Poggi, & Tuleau-Malot, 2010). …”
Section: Discussionmentioning
confidence: 99%
“…In random forest regression, the most widely used index of variable importance is the percentage of increase in mean square error (%IncMSE) following the permutation of a given predictor variable. Higher %IncMSE indicates greater variable importance (Genuer, Poggi, & Tuleau-Malot, 2010). …”
Section: Discussionmentioning
confidence: 99%
“…The optimal subset is then the subset yielding the smallest error frequency [24] or the smallest area under the curve [14]. An alternative variable selection approach based on a nested collection of random forests is described in Genuer et al [29]. Again, it needs to be emphasized that the resulting model with selected variables needs to be externally validated.…”
Section: Random Forests and Variable Selectionmentioning
confidence: 99%
“…An RF is essentially an ensemble method that constructs a multitude of decision trees (each of them trained with different subsets of features and examples), and yields the mean prediction of the individual trees [51]. RFs' classification and regression have been applied in different areas of concern in forest ecology, such as modelling the gradient of coniferous species [52], the occurrence of fire in Mediterranean regions [53], the classification of species or land cover type [54,55], and the analysis of the relative importance of the proposed drivers [55] or the selection of drivers [54,56,57]. The selection of RFs in our study is not incidental, and we capitalize on several useful properties.…”
Section: Random Forests Regressionmentioning
confidence: 99%