The mortality rate of critically ill patients in ICUs is relatively high. In order to evaluate patients’ mortality risk, different scoring systems are used to help clinicians assess prognosis in ICUs, such as the Acute Physiology and Chronic Health Evaluation III (APACHE III) and the Logistic Organ Dysfunction Score (LODS). In this research, we aimed to establish and compare multiple machine learning models with physiology subscores of APACHE III—namely, the Acute Physiology Score III (APS III)—and LODS scoring systems in order to obtain better performance for ICU mortality prediction. Methods: A total number of 67,748 patients from the Medical Information Database for Intensive Care (MIMIC-IV) were enrolled, including 7055 deceased patients, and the same number of surviving patients were selected by the random downsampling technique, for a total of 14,110 patients included in the study. The enrolled patients were randomly divided into a training dataset (n = 9877) and a validation dataset (n = 4233). Fivefold cross-validation and grid search procedures were used to find and evaluate the best hyperparameters in different machine learning models. Taking the subscores of LODS and the physiology subscores that are part of the APACHE III scoring systems as input variables, four machine learning methods of XGBoost, logistic regression, support vector machine, and decision tree were used to establish ICU mortality prediction models, with AUCs as metrics. AUCs, specificity, sensitivity, positive predictive value, negative predictive value, and calibration curves were used to find the best model. Results: For the prediction of mortality risk in ICU patients, the AUC of the XGBoost model was 0.918 (95%CI, 0.915–0.922), and the AUCs of logistic regression, SVM, and decision tree were 0.872 (95%CI, 0.867–0.877), 0.872 (95%CI, 0.867–0.877), and 0.852 (95% CI, 0.847–0.857), respectively. The calibration curves of logistic regression and support vector machine performed better than the other two models in the ranges 0–40% and 70–100%, respectively, while XGBoost performed better in the range of 40–70%. Conclusion: The mortality risk of ICU patients can be better predicted by the characteristics of the Acute Physiology Score III and the Logistic Organ Dysfunction Score with XGBoost in terms of ROC curve, sensitivity, and specificity. The XGBoost model could assist clinicians in judging in-hospital outcome of critically ill patients, especially in patients with a more uncertain survival outcome.