Carl Moons and colleagues provide a checklist and background explanation for critically appraising and extracting data from systematic reviews of prognostic and diagnostic prediction modelling studies. Please see later in the article for the Editors' Summary
BackgroundBefore considering whether to use a multivariable (diagnostic or prognostic) prediction model, it is essential that its performance be evaluated in data that were not used to develop the model (referred to as external validation). We critically appraised the methodological conduct and reporting of external validation studies of multivariable prediction models.MethodsWe conducted a systematic review of articles describing some form of external validation of one or more multivariable prediction models indexed in PubMed core clinical journals published in 2010. Study data were extracted in duplicate on design, sample size, handling of missing data, reference to the original study developing the prediction models and predictive performance measures.Results11,826 articles were identified and 78 were included for full review, which described the evaluation of 120 prediction models. in participant data that were not used to develop the model. Thirty-three articles described both the development of a prediction model and an evaluation of its performance on a separate dataset, and 45 articles described only the evaluation of an existing published prediction model on another dataset. Fifty-seven percent of the prediction models were presented and evaluated as simplified scoring systems. Sixteen percent of articles failed to report the number of outcome events in the validation datasets. Fifty-four percent of studies made no explicit mention of missing data. Sixty-seven percent did not report evaluating model calibration whilst most studies evaluated model discrimination. It was often unclear whether the reported performance measures were for the full regression model or for the simplified models.ConclusionsThe vast majority of studies describing some form of external validation of a multivariable prediction model were poorly reported with key details frequently not presented. The validation studies were characterised by poor design, inappropriate handling and acknowledgement of missing data and one of the most key performance measures of prediction models i.e. calibration often omitted from the publication. It may therefore not be surprising that an overwhelming majority of developed prediction models are not used in practice, when there is a dearth of well-conducted and clearly reported (external validation) studies describing their performance on independent participant data.
BackgroundTen events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.MethodsThe current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth’s correction, are compared.ResultsThe results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect (‘separation’). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth’s correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation.ConclusionsThe current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Latent class models (LCMs) combine the results of multiple diagnostic tests through a statistical model to obtain estimates of disease prevalence and diagnostic test accuracy in situations where there is no single, accurate reference standard. We performed a systematic review of the methodology and reporting of LCMs in diagnostic accuracy studies. This review shows that the use of LCMs in such studies increased sharply in the past decade, notably in the domain of infectious diseases (overall contribution: 59%). The 64 reviewed studies used a range of differently specified parametric latent variable models, applying Bayesian and frequentist methods. The critical assumption underlying the majority of LCM applications (61%) is that the test observations must be independent within 2 classes. Because violations of this assumption can lead to biased estimates of accuracy and prevalence, performing and reporting checks of whether assumptions are met is essential. Unfortunately, our review shows that 28% of the included studies failed to report any information that enables verification of model assumptions or performance. Because of the lack of information on model fit and adequate evidence "external" to the LCMs, it is often difficult for readers to judge the validity of LCM-based inferences and conclusions reached.
To cite this article: Hendriksen JMT, Geersing GJ, Moons KGM, de Groot JAH. Diagnostic and prognostic prediction models. J Thromb Haemost 2013; 11 (Suppl. 1): 129-41.Summary. Risk prediction models can be used to estimate the probability of either having (diagnostic model) or developing a particular disease or outcome (prognostic model). In clinical practice, these models are used to inform patients and guide therapeutic management. Examples from the field of venous thrombo-embolism (VTE) include the Wells rule for patients suspected of deep venous thrombosis and pulmonary embolism, and more recently prediction rules to estimate the risk of recurrence after a first episode of unprovoked VTE. In this paper, the three phases that are recommended before a prediction model may be used in daily practice are described: development, validation, and impact assessment. In the development phase, the focus is on model development commonly using a multivariable logistic (diagnostic) or survival (prognostic) regression analysis. The performance of the developed model is expressed by discrimination, calibration and (re-) classification. In the validation phase, the developed model is tested in a new set of patients using these same performance measures. This is important, as model performance is commonly poorer in a new set of patients, e.g. due to case-mix or domain differences. Finally, in the impact phase the ability of a prediction model to actually guide patient management is evaluated. Whereas in the development and validation phase single cohort designs are preferred, this last phase asks for comparative designs, ideally randomized designs; therapeutic management and outcomes after using the prediction model is compared to a control group not using the model (e.g. usual care).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.