Objective
To assess the methodological quality of studies on prediction models
developed using machine learning techniques across all medical
specialties.
Design
Systematic review.
Data sources
PubMed from 1 January 2018 to 31 December 2019.
Eligibility criteria
Articles reporting on the development, with or without external
validation, of a multivariable prediction model (diagnostic or
prognostic) developed using supervised machine learning for
individualised predictions. No restrictions applied for study design,
data source, or predicted patient related health outcomes.
Review methods
Methodological quality of the studies was determined and risk of
bias evaluated using the prediction risk of bias assessment tool
(PROBAST). This tool contains 21 signalling questions tailored to
identify potential biases in four domains. Risk of bias was measured for
each domain (participants, predictors, outcome, and analysis) and each
study (overall).
Results
152 studies were included: 58 (38%) included a diagnostic prediction
model and 94 (62%) a prognostic prediction model. PROBAST was applied to
152 developed models and 19 external validations. Of these 171 analyses,
148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of
bias. The analysis domain was most frequently rated at high risk of
bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an
inadequate number of events per candidate predictor, 62 handled missing
data inadequately (41%, 33% to 49%), and 59 assessed overfitting
improperly (39%, 31% to 47%). Most models used appropriate data sources
to develop (73%, 66% to 79%) and externally validate the machine
learning based prediction models (74%, 51% to 88%). Information about
blinding of outcome and blinding of predictors was, however, absent in
60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models,
respectively.
Conclusion
Most studies on machine learning based prediction models show poor
methodological quality and are at high risk of bias. Factors
contributing to risk of bias include small study size, poor handling of
missing data, and failure to deal with overfitting. Efforts to improve
the design, conduct, reporting, and validation of such studies are
necessary to boost the application of machine learning based prediction
models in clinical practice.
Systematic review registration
PROSPERO CRD42019161764.