Background
This study aimed to construct predictive models for the risk of sepsis in patients with Acute pancreatitis (AP) using machine learning methods and compared optimal one with the logistic regression (LR) model and scoring systems.
Methods
In this retrospective cohort study, data were collected from the Medical Information Mart for Intensive Care III (MIMIC III) database between 2001 and 2012 and the MIMIC IV database between 2008 and 2019. Patients were randomly divided into training and test sets (8:2). The least absolute shrinkage and selection operator (LASSO) regression plus 5-fold cross-validation were used to screen and confirm the predictive factors. Based on the selected predictive factors, 6 machine learning models were constructed, including support vector machine (SVM), K-nearest neighbour (KNN), multi-layer perceptron (MLP), LR, gradient boosting decision tree (GBDT) and adaptive enhancement algorithm (AdaBoost). The models and scoring systems were evaluated and compared using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and the area under the curve (AUC).
Results
A total of 1, 672 patients were eligible for participation. In the training set, 261 AP patients (19.51%) were diagnosed with sepsis. The predictive factors for the risk of sepsis in AP patients included age, insurance, vasopressors, mechanical ventilation, Glasgow Coma Scale (GCS), heart rate, respiratory rate, temperature, SpO2, platelet, red blood cell distribution width (RDW), International Normalized Ratio (INR), and blood urea nitrogen (BUN). The AUC of the GBDT model for sepsis prediction in the AP patients in the testing set was 0.985. The GBDT model showed better performance in sepsis prediction than the LR, systemic inflammatory response syndrome (SIRS) score, bedside index for severity in acute pancreatitis (BISAP) score, sequential organ failure assessment (SOFA) score, quick-SOFA (qSOFA), and simplified acute physiology score II (SAPS II).
Conclusion
The present findings suggest that compared to the classical LR model and SOFA, qSOFA, SAPS II, SIRS, and BISAP scores, the machine learning model-GBDT model had a better performance in predicting sepsis in the AP patients, which is a useful tool for early identification of high-risk patients and timely clinical interventions.