1997
DOI: 10.1002/(sici)1097-0258(19971230)16:24<2813::aid-sim701>3.0.co;2-z
|View full text |Cite
|
Sign up to set email alerts
|

Resampling and cross-validation techniques: a tool to reduce bias caused by model building?

Abstract: The process of model building involved in the analysis of many medical studies may lead to a considerable amount of over-optimism with respect to the predictive ability of the 'final' regression model. In this paper we illustrate this phenomenon in a simple cutpoint model and explore to what extent bias can be reduced by using cross-validation and bootstrap resampling. These computer intensive methods are compared to an ad hoc approach and to a heuristic method. Besides illustrating all proposals with the data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
75
0
3

Year Published

1999
1999
2015
2015

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 135 publications
(81 citation statements)
references
References 12 publications
3
75
0
3
Order By: Relevance
“…This type of effect is expected, because stepwise selection procedures during model derivation tend to overfit the model to the derivation data, thus leading to optimistic C statistics. 27 In the validation models with 15, 8, and then 5 items, the observed decreases in the C statistics demonstrate why such models need to be validated in distinct patient populations. Another part of the difference in the predictive performance of the derivation versus validation models may have to do with differences in the patient populations.…”
Section: Discussionmentioning
confidence: 99%
“…This type of effect is expected, because stepwise selection procedures during model derivation tend to overfit the model to the derivation data, thus leading to optimistic C statistics. 27 In the validation models with 15, 8, and then 5 items, the observed decreases in the C statistics demonstrate why such models need to be validated in distinct patient populations. Another part of the difference in the predictive performance of the derivation versus validation models may have to do with differences in the patient populations.…”
Section: Discussionmentioning
confidence: 99%
“…Bootstrapping techniques were used for internal validation of the model (26,27), and bootstrap samples were drawn 200 times with replacement. Regression models were created in each bootstrap sample and tested on the original sample to obtain stable estimates of the optimism of the model, i.e., how much the model performance was expected to decrease when applied in new datasets (28)(29)(30). All analyses were performed using STATA version 13.1 (StataCorp LP) and SAS statistical software version 9.3 (SAS Institute).…”
Section: Statistical Analysesmentioning
confidence: 99%
“…11 Other, less common model approaches resort to complex mathematical analytics of the data. These models often utilize a broad range of methods involving machine learning and pattern recognition, among others, 12,13 and they are often, but not always, limited to classification tree, neural network, k-nearest neighbor. 13 The model is often trained on large number of individuals of the cohort and validated on a faction of the cohort data or on data from another study.…”
Section: Predictive Modelsmentioning
confidence: 99%