Background and Aim
Early recurrence (ER) presents a challenge for the survival prognosis of patients with hepatocellular carcinoma (HCC). The aim of this study was to investigate machine learning (ML) models using clinical data for predicting ER after microwave ablation (MWA).
Methods
Between August 2005 and December 2019, 1574 patients with early-stage HCC underwent MWA at four hospitals were reviewed. Then, 36 clinical data points per patient were collected, and the patients were assigned to the training, internal, and external validation set. Apart from traditional logistic regression (LR), three ML models—random forest, support vector machine, and eXtreme Gradient Boosting (XGBoost)—were built and validated for their predictive ability with the area under ROC curve (AUC). Algorithms such as SHapley Additive exPlanations (SHAP) and local interpretable model-agnostic explanations (LIME) were used to realize their interpretability.
Results
The three ML models all outperformed LR (
P
< 0.001 for all) in predictive ability. When nine variables (tumor number, platelet, α-fetoprotein, comorbidity score, white blood cell, cholinesterase, prothrombin time, neutrophils, and etiology) were extracted simultaneously using recursive feature elimination with cross-validation, the XGBoost model achieved the best discrimination among all models, with an AUC value 0.75 (95% CI [confidence interval]: 0.72–0.78) in the training set, 0.74 (95% CI: 0.69–0.80) in the internal validation set, and 0.76 (95% CI: 0.70–0.82) in the external validation set, and it was interpreted depending on the visualization of risk factors by the SHAP and LIME algorithms. The predictive system of post-ablation recurrence risk stratification was provided on online (
http://114.251.235.51:8001/
) based on XGboost analysis.
Conclusion
The XGBoost model based on clinical data can effectively predict ER risk after MWA, which can contribute to surveillance, prevention, and treatment strategies for HCC.