2018
DOI: 10.1007/s10994-018-5714-4
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation

Abstract: Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias C… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
155
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 165 publications
(159 citation statements)
references
References 42 publications
3
155
0
1
Order By: Relevance
“…44 for an explanation); the optimism is removed using a bootstrap procedure before it is returned in a similar fashion as in ref. 44 In the end, K performance estimates are computed for each configuration, and the one with the best average performance is selected as the best configuration. A final model is produced by applying the best configuration on the complete set of MOFs.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…44 for an explanation); the optimism is removed using a bootstrap procedure before it is returned in a similar fashion as in ref. 44 In the end, K performance estimates are computed for each configuration, and the one with the best average performance is selected as the best configuration. A final model is produced by applying the best configuration on the complete set of MOFs.…”
Section: Methodsmentioning
confidence: 99%
“…The optimism problem has been noted both theoretically as well as experimentally. JAD estimates the bias of the performance using a bootstrap method, 43,44 and removes it to return the final performance estimate.…”
Section: Methodsmentioning
confidence: 99%
“…On this dataset, we performed nested-cross validation by combining a three-way split of the data (trainingvalidation-testing) with leave-one-out cross-validation (CV) and grid search for SVM parameter (boxconstraint) tuning. This was done to avoid upward bias in the metrics of performance estimates (Guyon and Elisseeff, 2003;Tsamardinos et al, 2018). Additionally, we avoided any bias in the selection of the most discriminatory threshold pair (i.e., z-score and percentage abnormality) to determine the node abnormality by computing it at every step of cross-validation after removing the test subject (Smialowski et al, 2009).…”
Section: Predictive Model Design For Generalizability Assessmentmentioning
confidence: 99%
“…To do so, the sample was randomly divided into three mutually exclusive datasets: Training (70%), validation (10%), and testing (20%). This process used the 10 fold cross-validation method with 500 iterations to estimate error ratios [37]. The first subset of data was used to train the models and estimating the parameters.…”
Section: Research Stepsmentioning
confidence: 99%