BACKGROUND
Healthcare-associated infections (HAI) due to multi-drug resistant organisms (MDROs), such as Methicillin-resistant Staphylococcus aureus (MRSA) and C. difficile, place a significant burden on our healthcare infrastructure.
OBJECTIVE
Screening for MDROs is an important mechanism for preventing spread but is resource intensive. Automated tools that can predict colonization/infection risk using Electronic Health Record (EHR) data could be provided useful information to aid infection control and guide empiric antibiotic coverage.
METHODS
Retrospective development of machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection in hospitalized patients at the University of Virginia hospital. We use clinical and non-clinical features derived from on-admission and throughout-stay information from the patient’s EHR data to build the model. Additionally, we use a class of features derived from contact networks in EHR data - these network features can capture patients’ contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explore heterogeneous models for different patient subpopulations, e.g., those admitted to an ICU or ED or with specific testing histories, which have better performance.
RESULTS
We find that the logistic regression performs better than other methods, and the performance (ROC-AUC) of this model improves by nearly 11% when we use polynomial (2nd degree) transformation of the features. Some of the features which are significant in predicting MDRO risk include antibiotic usage, surgery, device, dialysis, patient’s comorbidity conditions, and network features. Among these, network features add the most value and improve the model performance by at least 15%. The logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.
CONCLUSIONS
Our work shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and non-clinical features derived from EHR data. Network features are most predictive and give significant improvement over prior methods. Further, heterogeneous prediction models for different patient subpopulations enhance the model's performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.