2007
DOI: 10.1097/01.ccm.0000275267.64078.b0
|View full text |Cite
|
Sign up to set email alerts
|

Assessing the calibration of mortality benchmarks in critical care: The Hosmer-Lemeshow test revisited*

Abstract: Caution should be used in interpreting the calibration of predictive models developed using a smaller data set when applied to larger numbers of patients. A significant Hosmer-Lemeshow test does not necessarily mean that a predictive model is not useful or suspect. While decisions concerning a mortality model's suitability should include the Hosmer-Lemeshow test, additional information needs to be taken into consideration. This includes the overall number of patients, the observed and predicted probabilities w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

5
558
0
3

Year Published

2008
2008
2015
2015

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 739 publications
(566 citation statements)
references
References 19 publications
5
558
0
3
Order By: Relevance
“…With regards to calibration of the model, the Hosmer-Lemeshow goodness of fit test was significant (p < 0.0001); however, this test has been shown to perform poorly with large sample sizes. [11][12][13] We measured the overall model performance, based on the overall model discrimination and calibration, using the Brier score, where a score of 0 indicates a perfect model and a score of 0.25 indicates a noninformative model. The Brier score was 0.021, indicating good model performance.…”
Section: Development and Validation Of The Unplanned Intubation Risk mentioning
confidence: 99%
“…With regards to calibration of the model, the Hosmer-Lemeshow goodness of fit test was significant (p < 0.0001); however, this test has been shown to perform poorly with large sample sizes. [11][12][13] We measured the overall model performance, based on the overall model discrimination and calibration, using the Brier score, where a score of 0 indicates a perfect model and a score of 0.25 indicates a noninformative model. The Brier score was 0.021, indicating good model performance.…”
Section: Development and Validation Of The Unplanned Intubation Risk mentioning
confidence: 99%
“…If the model calibrates well, there will not be a substantial deviation from the 45°line of perfect fit or bisector. On the contrary, miscalibration of the model will be a function of expected probability.The H-L test is easy to compute and its interpretation is intuitive, but it has acknowledged limitations such as being very sensitive to sample size [6][7][8]. The traditional plot or calibration curve also has some disadvantages: first, rather than a curve, it is a jagged line connecting the points in the plot; second, it is not accompanied by any information on the statistical significance of deviations from the bisector [9].…”
mentioning
confidence: 99%
“…We can argue that the H-L statistics associated with the traditional calibration plot would have provided more or less the same information. In fact, such calibration curve plots are based on strata from H-L goodness-of-fit test and these deciles of risk would have also been able to indicate the direction, extent, and risk classes affected by deviations [5][6][7]. However, we must consider the new approach as a step forward because H-L statistics and traditional calibration plot the average risk in each riskclass, whereas the new method is based on a continuous function that does not average the risk in subgroups.…”
mentioning
confidence: 99%
See 2 more Smart Citations