Calibration is one of the main properties that must be accomplished by any predictive model. Overcoming the limitations of many approaches developed so far, a study has recently proposed the calibration belt as a graphical tool to identify ranges of probability where a model based on dichotomous outcomes miscalibrates. In this new approach, the relation between the logits of the probability predicted by a model and of the event rates observed in a sample is represented by a polynomial function, whose coefficients are fitted and its degree is fixed by a series of likelihood-ratio tests. We propose here a test associated with the calibration belt and show how the algorithm to select the polynomial degree affects the distribution of the test statistic. We calculate its exact distribution and confirm its validity via a numerical simulation. Starting from this distribution, we finally reappraise the procedure to construct the calibration belt and illustrate an application in the medical context.
Evaluating the goodness of fit of logistic regression models is crucial to ensure the accuracy of the estimated probabilities. Unfortunately, such evaluation is problematic in large samples. Because the power of traditional goodness of fit tests increases with the sample size, practically irrelevant discrepancies between estimated and true probabilities are increasingly likely to cause the rejection of the hypothesis of perfect fit in larger and larger samples. This phenomenon has been widely documented for popular goodness of fit tests, such as the Hosmer‐Lemeshow test. To address this limitation, we propose a modification of the Hosmer‐Lemeshow approach. By standardizing the noncentrality parameter that characterizes the alternative distribution of the Hosmer‐Lemeshow statistic, we introduce a parameter that measures the goodness of fit of a model but does not depend on the sample size. We provide the methodology to estimate this parameter and construct confidence intervals for it. Finally, we propose a formal statistical test to rigorously assess whether the fit of a model, albeit not perfect, is acceptable for practical purposes. The proposed method is compared in a simulation study with a competing modification of the Hosmer‐Lemeshow test, based on repeated subsampling. We provide a step‐by‐step illustration of our method using a model for postneonatal mortality developed in a large cohort of more than 300 000 observations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.