Background
Clostridioides difficile infection (CDI) is a leading cause of healthcare-associated infection and may result in organ dysfunction, colectomy, and death. Published risk scores to predict severe complications from CDI demonstrate poor performance upon external validation. We hypothesized that building and validating a model using geographically and temporally distinct cohorts would more accurately predict risk for complications from CDI.
Methods
We conducted a multi-center retrospective cohort study of adults diagnosed with CDI. After randomly partitioning the data into training and validation sets, we developed and compared three machine learning algorithms (Lasso regression, random forest, stacked ensemble) with 10-fold cross-validation to predict disease-related complications (intensive care unit admission, colectomy, or death attributable to CDI) within 30 days of diagnosis. Model performance was assessed using the area under the receiver operating curve (AUC).
Results
A total of 3,646 patients with CDI were included, of whom 217 (6%) had complications. All three models performed well (AUC 0.88-0.89). Variables of importance were similar across models, including albumin, bicarbonate, change in creatinine, non-CDI-related ICU admission, and concomitant non-CDI antibiotics. Sensitivity analyses indicated that model performance was robust even when varying derivation cohort inclusion and CDI testing approach. However, race was an important modifier with models showing worse performance in non-White patients.
Conclusion
Using a large heterogeneous population of patients, we have developed and validated a prediction model that estimates risk for complications from CDI with good accuracy. Future studies should aim to reduce disparity in model accuracy between White and non-White patients and to improve performance overall.