Background:
Despite research efforts, predicting Clostridioides difficile incidence and its outcomes remains challenging. This systematic review aimed to evaluate the performance of machine-learning (ML) models in predicting CDI incidence and complications using clinical data from electronic health records.
Methods:
We conducted a comprehensive search of databases (OVID, Embase, MEDLINE ALL, Web of Science, and Scopus) from inception up to September 2023. Studies employing ML techniques for predicting CDI or its complications were included. The primary outcome was the type and performance of ML models assessed using the area under the receiver operating characteristic curve (AUROC).
Results:
Twelve retrospective studies that evaluated CDI incidence and/or outcomes were included. The most common used ML models were random forest and Gradient Boosting. The AUROC ranged from 0.60 to 0.81 for predicting CDI incidence, 0.59 to 0.80 for recurrence, and 0.64 to 0.88 for predicting complications. Advanced ML models demonstrated similar performance to traditional logistic regression. However, there was notable heterogeneity in defining CDI and the different outcomes, including incidence, recurrence, and complications, and a lack of external validation in most studies.
Conclusion:
ML models show promise in predicting CDI incidence and outcomes. However, the observed heterogeneity in CDI definitions and the lack of real-world validation highlight challenges in clinical implementation. Future research should focus on external validation and the use of standardized definitions across studies.