2019
DOI: 10.1002/widm.1301
|View full text |Cite
|
Sign up to set email alerts
|

Hyperparameters and tuning strategies for random forest

Abstract: The random forest (RF) algorithm has several hyperparameters that have to be set by the user, for example, the number of observations drawn randomly for each tree and whether they are drawn with or without replacement, the number of variables drawn randomly for each split, the splitting rule, the minimum number of samples that a node must contain, and the number of trees. In this paper, we first provide a literature review on the parameters' influence on the prediction performance and on variable importance me… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

14
739
1
7

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 1,071 publications
(761 citation statements)
references
References 51 publications
14
739
1
7
Order By: Relevance
“…We validated the modular decomposition using permutation testing on three levels: (i) the individual unthresholded FC-matrix and (ii) the co-classification matrix of each participant, as well as (iii) the co-classification matrix at the group level (Dwyer et al 2014 leakage of information from the test to the training data. We evaluated the performance of the classifier using (i) a nested-cross validation (leaving out the two observations corresponding to one subject for testing) and (ii) an inner validation approach for the hyperparameter optimization of the random forest-classifier (using a sequential model-based optimization implemented by the Scikit-Optimize library; skopt, https://github.com/scikit-optimize/scikit-optimize), iteratively tuning the following parameters following the recommendations by Probst et al (Probst, Wright and Boulesteix 2019): maximum depth of the tree, number of features, minimum number of samples and minimum number of samples required to be at a leaf node . We statistically validated the observed accuracy using permutation testing ( p < 0.05, 5000 iterations) randomizing the class labels.…”
Section: Modularity Analysismentioning
confidence: 99%
“…We validated the modular decomposition using permutation testing on three levels: (i) the individual unthresholded FC-matrix and (ii) the co-classification matrix of each participant, as well as (iii) the co-classification matrix at the group level (Dwyer et al 2014 leakage of information from the test to the training data. We evaluated the performance of the classifier using (i) a nested-cross validation (leaving out the two observations corresponding to one subject for testing) and (ii) an inner validation approach for the hyperparameter optimization of the random forest-classifier (using a sequential model-based optimization implemented by the Scikit-Optimize library; skopt, https://github.com/scikit-optimize/scikit-optimize), iteratively tuning the following parameters following the recommendations by Probst et al (Probst, Wright and Boulesteix 2019): maximum depth of the tree, number of features, minimum number of samples and minimum number of samples required to be at a leaf node . We statistically validated the observed accuracy using permutation testing ( p < 0.05, 5000 iterations) randomizing the class labels.…”
Section: Modularity Analysismentioning
confidence: 99%
“…We used the open-source R software environment for statistical computing and graphics (version 3.5.0) under an integrated development environment for R -RStudio 295 (RStudio Desktop version 1.1.447) to analyse data assembled on step V. For regression tasks we used the ranger package (version 0.10.1) as an implementation of the random forests (Wright and Ziegler 2017). To obtain the most accurate predictions, the random forest parameters need to be optimised (Probst et al 2018). To configure the parameters of the random forest, we used the tuneRanger (version 0.3) package (Probst et al 2018) 300 which allows model-based optimization for tuning strategy and the three parameters min.node.size, sample.fraction and mtry tuning at once.…”
Section: Spatial Data Analysismentioning
confidence: 99%
“…To obtain the most accurate predictions, the random forest parameters need to be optimised (Probst et al 2018). To configure the parameters of the random forest, we used the tuneRanger (version 0.3) package (Probst et al 2018) 300 which allows model-based optimization for tuning strategy and the three parameters min.node.size, sample.fraction and mtry tuning at once. Out-of-bag predictions were used for evaluation.…”
Section: Spatial Data Analysismentioning
confidence: 99%
“…Compared with logistic regression, random forests requires tuning for some of the parameters. We used the function OOBCurve in the homonymous R package (Probst and Boulesteix, ) to tune the number of trees, whereas we use the latest implementations in the tuneRanger R package (Probst et al ., ) to tune the number of variables randomly sampled as candidates at each split, the minimal size of terminal nodes and the fraction of observations to sample (function tuneRanger). In all cases, we choose the area under the curve (AUC) as the performance criterion for tuning.…”
Section: Assessing the Effect Of Company And Network Data On Credit Riskmentioning
confidence: 99%