The process of model building involved in the analysis of many medical studies may lead to a considerable amount of over-optimism with respect to the predictive ability of the 'final' regression model. In this paper we illustrate this phenomenon in a simple cutpoint model and explore to what extent bias can be reduced by using cross-validation and bootstrap resampling. These computer intensive methods are compared to an ad hoc approach and to a heuristic method. Besides illustrating all proposals with the data from a breast cancer study we perform a simulation study in order to assess the quality of the methods.
When investigating the effects of potential prognostic or risk factors that have been measured on a quantitative scale, values of these factors are often categorized into two groups. Sometimes an 'optimal' cutpoint is chosen that gives the best separation in terms of a two-sample test statistic. It is well known that this approach leads to a serious inflation of the type I error and to an overestimation of the effect of the prognostic or risk factor in absolute terms. In this paper, we illustrate that the resulting confidence intervals are similarly affected. We show that the application of a shrinkage procedure to correct for bias, together with bootstrap resampling for estimating the variance, yields confidence intervals for the effect of a potential prognostic or risk factor with the desired coverage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.