2019
DOI: 10.1016/j.ecolmodel.2019.06.002
|View full text |Cite
|
Sign up to set email alerts
|

Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

Abstract: Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly predictors.Results show that GAM and Random Forest (RF) (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

12
217
1
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 367 publications
(232 citation statements)
references
References 102 publications
12
217
1
2
Order By: Relevance
“…The results of this study strongly suggest that spatial cross-validation needs to be considered not only for model validation and model tuning (see Schratz et al, 2019, for a study on the relevance of spatial validation for hyperparameter tuning in machine learning applications) but also for variable selection, hence during all steps of model building.…”
Section: Relevance Of Spatial Variable Selectionmentioning
confidence: 93%
“…The results of this study strongly suggest that spatial cross-validation needs to be considered not only for model validation and model tuning (see Schratz et al, 2019, for a study on the relevance of spatial validation for hyperparameter tuning in machine learning applications) but also for variable selection, hence during all steps of model building.…”
Section: Relevance Of Spatial Variable Selectionmentioning
confidence: 93%
“…By default, the RF algorithm sets the mtry parameter to the square root of the number of input variables [30]. However, studies have shown that the optimal mtry value can be found below this value [60]. Therefore, we set the upper boundary of the search space for this hyperparameter lower than the proposed square root of the number of predictors.…”
Section: Classifier Algorithm Description and Parameter Tuningmentioning
confidence: 99%
“…Because spatial autocorrelation between training and test sets may produce optimistic bias in assessments of classification performance [39,41,42], we used a spatial leave-one-out cross-validation (SLOO-CV) sampling strategy [54,55] to separate the training and test sets to guarantee full independence between them. In this approach, one reference sample is used as the test set and the remaining samples, non-spatially correlated with the test set, are used as the training set ( Figure 4).…”
Section: Estimating Prediction Errors By Spatial Cross-validationmentioning
confidence: 99%
“…Several drivers of classification errors remain insufficiently explored, among which, spatial autocorrelation of reference data has long been identified but rarely quantified [39,40]. Spatial dependence in the reference data due to an inadequate sampling strategy to split training and validation sets can wrongly increase classification accuracy [39,41,42]. Despite different approaches addressing this issue by imposing a spatial stratification to select samples for training and testing [41,42], the spatial autocorrelation is not always estimated explicitely.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation