2009
DOI: 10.1002/cem.1225
|View full text |Cite
|
Sign up to set email alerts
|

Repeated double cross validation

Abstract: Repeated double cross validation (rdCV) is a strategy for (a) optimizing the complexity of regression models and (b) for a realistic estimation of prediction errors when the model is applied to new cases (that are within the population of the data used). This strategy is suited for small data sets and is a complementary method to bootstrap methods. rdCV is a formal, partly new combination of known procedures and methods, and has been implemented in a function for the programming environment R, providing severa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
320
0
2

Year Published

2015
2015
2021
2021

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 412 publications
(334 citation statements)
references
References 28 publications
1
320
0
2
Order By: Relevance
“…While each time one fold was withheld in the outer loop as an independent test set (test fold) to compute unbiased estimates of prediction performance, parameter selection was carried out by 4-fold inner cross-validation on the remaining four folds (training folds). Once the best parameters had been found for the four training folds of the outer cross-validation loop, a model was trained on these four folds using the optimized parameters and applied to the withheld test fold to estimate the prediction performance on independent data (45,46). This was applied to all five possible combinations of alternating splits (5-fold cross-validation), finally resulting in five sets of markers for any tested condition.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…While each time one fold was withheld in the outer loop as an independent test set (test fold) to compute unbiased estimates of prediction performance, parameter selection was carried out by 4-fold inner cross-validation on the remaining four folds (training folds). Once the best parameters had been found for the four training folds of the outer cross-validation loop, a model was trained on these four folds using the optimized parameters and applied to the withheld test fold to estimate the prediction performance on independent data (45,46). This was applied to all five possible combinations of alternating splits (5-fold cross-validation), finally resulting in five sets of markers for any tested condition.…”
Section: Methodsmentioning
confidence: 99%
“…The P-SVM has been designed for exactly such purposes and has proven to be highly successful for analyzing high-dimensional molecular data (56). The training of the models, along with the selection of optimal hyperparameters, was done by nested cross-validation in order to facilitate model selection while still obtaining unbiased estimates of the generalization performance (45,46).…”
Section: Figmentioning
confidence: 99%
“…The leave-one-out (LOO) crossvalidation method [29][30][31][32] was used to find out the optimal set of spectral lines and to estimate the regression model predictivity. We used spectra of 36 pellets with nine different heating values for calibration, and for testing, we left out a subset of spectra of four pellets of the same sample having the known heating value from calorimetric measurements.…”
Section: Cross-validationmentioning
confidence: 99%
“…Many authors favor cross-validation as it 'gives a reliable picture with no apparent systematic over-or underestimation' [31]; 'overfitting is avoided by the repeated double cross-validation approach' [32]; 'LOO gives too small a perturbation to the data, so that Q 2 approaches the properties of R 2 asymptotically' [33]; 'The crossvalidation estimate of prediction error is nearly unbiased but can be highly variable.' [34].…”
Section: Use Of Alternative Training-test Set Splitsmentioning
confidence: 99%