BackgroudPatients with sepsis complicated by anemia have a higher risk of mortality. It is clinically important to study the risk factors associated with the prognosis of this disease. The aim of this study was to establish a predictive model of mortality during hospitalization by extracting clinical data from the Medical Information Mart for Intensive Care III (MIMIC-III) database. MethodsThe clinical data of patients with sepsis complicated by anemia in the MIMIC-III database were retrospectively analyzed. Indexes were screened by stepwise logistic regression (LR), and machine learning predictive models such as Decision Tree (DT), Random Forests (RF), and eXtreme Gradient Boosting (XGBoost) were developed and compared, identifying advantages and disadvantages of each model. ResultsA total of 13,547 patients with sepsis complicated by anemia were included in the study, among which 1,827 died during hospitalization and 11,720 were still alive at discharge. The preliminary stepwise regression model selected 20 clinical indexes, including Elixhauser comorbidity index, maximum blood urea nitrogen (BUN), and maximum hemoglobin reduction. The predictive models showed good discriminative ability (area under the receiver operating characteristic curve [AUROC]:LR, 0.777; DT, 0.726; RF, 0.788; XGBoost, 0.815) and goodness of fit (area under the precision-recall curve [AUPRC]: LR, 0.350; DT, 0.290; RF, 0.400; XGBoost, 0.428). The Shapley Additive exPlanation (SHAP) values in the XGBoost model showed that Elixhauser comorbidity index, maximum BUN, maximum hemoglobin reduction, ventilator use within 24 hours of admission, and age were significant features for predicting in-hospital mortality in patients with sepsis complicated by anemia. ConclusionsThe XGBoost model had better discrimination ability and goodness of fit when compared with other models. Machine learning algorithms have significant practical value in the development of an early warning system for patients with sepsis complicated by anemia.