Objective: Post-hepatectomy liver failure (PHLF) is a terrible and serious complication after liver resection. Machine learning algorithms are emerging for data mining in recent years and have been shown to have greater advantages over traditional statistics. This study includes a comparison of different traditional machine learning algorithms and selects the best model for predicting PHLF.
Materials and Methods:Review the data of patients who had undergone resection of hepatocellular carcinoma from January 2013 to October 2022 in our hospital and randomly divide the data into a training set and a validation set at a 7:3 ratio. Using mutual information to screen 10 clinical characteristics with a higher correlation to PHLF. The data was trained and validated using Logistic Regression, Decision Tree, Gradient Boosting Decision Tree(GBDT), Random Forest, Extreme Gradient Boosting(XGBoost), LightGBM, multi-model fusion(hard voting), and multi-model fusion(soft voting). The hyperparameter of different machine learning was searched to achieve the best-fitting performance. Different traditional machine learning algorithms are evaluated comprehensively through accuracy rate, precision rate, recall rate, F1 score, and Receiver Operating Characteristic (ROC) and its area under the curve(AUC). Based on the feature importance ranking of the best model, clinical characteristics related to PHLF were ranked.
Results: A total of 319 patients’ data were included in this study, with 9.4% of the patients in the liver failure group(n=30). 10 clinical characteristics with higher correlation to PHLF are preoperative platelet count, preoperative prothrombin time, perioperative blood loss, perioperative transfusion(Yes/No), duration of surgery, clinically significant portal hypertension(Yes/No), preoperative aspartate aminotransferase, preoperative albumin, preoperative total bilirubin, and type of resection(minor/major). XGBoost and LightGBM showed the best performance on training set with an accuracy rate of 1. However, their performance decreased on validation set with an accuracy rate of 0.9375 and 0.9167, respectively. GBDT had the best anti-fitting performance in the training and validation sets, with an accuracy rate of 0.9462 and 0.9479, respectively. Preoperative albumin, perioperative blood loss, preoperative platelet count, duration of surgery, and preoperative alanine transaminase had higher weights in GBDT. The accuracy rate of the multi-model fusion(hard voting) was 0.9955 and 0.9583 in the training and validation cohort, respectively, while the accuracy rate of the multi-model fusion(soft voting) was 0.9731 and 0.9479 on training set and validation set, respectively.
Conclusion: GBDT performed the best among different traditional machine learning algorithms, and XGBoost and LightGBM still have great potential. Both multi-model fusion(hard voting) and multi-model fusion(soft voting) have improved the anti-fitting performance to some extent. Preoperative albumin, perioperative blood loss, preoperative platelet count, duration of surgery, and preoperative aspartate aminotransferase are the five most important clinical characteristics.
Retrospectively registered:Ethics Y(2022)130; 2022/09/17