External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients

Wong, Andrew; Ötleş, Erkin; Donnelly, John P.; Krumm, Andrew E.; McCullough, Jeffrey S.; DeTroyer-Cooley, Olivia; Pestrue, Justin; Phillips, M.; Konye, Judy; Penoza, Carleen; Ghous, Muhammad; Singh, Karandeep

doi:10.1001/jamainternmed.2021.2626

Cited by 431 publications

(320 citation statements)

References 24 publications

Supporting

Mentioning

313

Contrasting

Unclassified

Order By: Relevance

“…Similarly, Wong et al suggested a hospitalization-level AUROC based on the entire trajectory of predictions to enable more realistic evaluations 25 .…”

Section: Discussionmentioning

confidence: 99%

The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

et al. 2021

View full text Add to dashboard Cite

Problem framing is critical to developing risk prediction models because all subsequent development work and evaluation takes place within the context of how a problem has been framed and explicit documentation of framing choices makes it easier to compare evaluation metrics between published studies. In this work, we introduce the basic concepts of framing, including prediction windows, observation windows, window shifts and event-triggers for a prediction that strongly affects the risk of clinician fatigue caused by false positives. Building on this, we apply four different framing structures to the same generic dataset, using a sepsis risk prediction model as an example, and evaluate how framing affects model performance and learning. Our results show that an apparently good model with strong evaluation results in both discrimination and calibration is not necessarily clinically usable. Therefore, it is important to assess the results of objective evaluations within the context of more subjective evaluations of how a model is framed.

show abstract

“…Similarly, Wong et al suggested a hospitalization-level AUROC based on the entire trajectory of predictions to enable more realistic evaluations 25 .…”

Section: Discussionmentioning

confidence: 99%

The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

et al. 2021

View full text Add to dashboard Cite

show abstract

“…A recent independent validation of the Epic Sepsis Model indeed found decreased calibration and discrimination. 23 Low adherence rates when considering entire model reporting guidelines suggest opportunities to better operationalize reporting practices to ensure deployed models are useful, reliable and fair.…”

Section: Discussionmentioning

confidence: 99%

“…[10][11][12][13][14][15][16][17][18] Nevertheless, predictive models have been deployed in healthcare settings without transparency or independent validation, 19,20 and their subsequent failures have been met with public outcry. 2,[21][22][23] Adhering to model reporting guidelines is one way to improve the usefulness, [24][25][26][27][28] fairness, 29,30 and reliability 27,[31][32][33][34] of clinical predictive models. Reporting guidelines have long been used to assess the strength of clinical trial studies, 35,36 observational studies, 37 and diagnostic studies.…”

Section: Introductionmentioning

confidence: 99%

Low adherence to existing model reporting guidelines by commonly used clinical prediction models

Callahan

Patel

et al. 2021

Preprint

View full text Add to dashboard Cite

Objective: To assess whether the documentation available for commonly used machine learning models developed by an electronic health record (EHR) vendor provides information requested by model reporting guidelines. Materials and Methods: We identified items requested for reporting from model reporting guidelines published in computer science, biomedical informatics, and clinical journals, and merged similar items into representative "atoms". Four independent reviewers and one adjudicator assessed the degree to which model documentation for 12 models developed by Epic Systems reported the details requested in each atom. We present summary statistics of consensus, interrater agreement, and reporting rates of all atoms for the 12 models. Results: We identified 220 unique atoms across 15 model reporting guidelines. After examining the documentation for the 12 most commonly used Epic models, the independent reviewers had an interrater agreement of 76%. After adjudication, the model documentations' median completion rate of applicable atoms was 39% (range: 31%-47%). Most of the commonly requested atoms had reporting rates of 90% or above, including atoms concerning outcome definition, preprocessing, AUROC, internal validation and intended clinical use. For individual reporting guidelines, the median adherence rate for an entire guideline was 54% (range: 15%-71%). Atoms reported half the time or less included those relating to fairness (summary statistics and subgroup analyses, including for age, race/ethnicity, or sex), usefulness (net benefit, prediction time, warnings on out-of-scope use and when to stop use), and transparency (model coefficients). Atoms reported the least often related to missingness (missing data statistics, missingness strategy), validation (calibration plot, external validation), and monitoring (how models are updated/tuned, prediction monitoring). Conclusion: There are many recommendations about what should be reported about predictive models used to guide care. Existing model documentation examined in this study provides less than half of applicable atoms, and entire reporting guidelines have low adherence rates. Half or less of the reviewed documentation reported information related to usefulness, reliability, transparency and fairness of models. There is a need for better operationalization of reporting recommendations for predictive models in healthcare.

show abstract

“…Uncertain confidence in prognostic models may be particularly acute for COVID-19, as a recent systematic review of 31 prediction models for COVID-19 concluded that most published models have been poorly reported and were at high risk of bias [6]. Similar uncertainty regarding model performance when tested in external or independent samples has been described for other prediction models recently [7]. A cursory PubMed search using "prediction" and some ICU relevant conditions over a period of 20 months shows a plethora of publications (Fig.…”

mentioning

confidence: 88%

Assess COVID-19 prognosis … but be aware of your instrument’s accuracy!

2021

View full text Add to dashboard Cite

External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients

Cited by 431 publications

References 24 publications

The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

The Framing of machine learning risk prediction models illustrated by evaluation of sepsis in general wards

Low adherence to existing model reporting guidelines by commonly used clinical prediction models

Assess COVID-19 prognosis … but be aware of your instrument’s accuracy!

Contact Info

Product

Resources

About