Background: The continuous increase in the incidence of HCC in China is an urgent issue, and early diagnosis and treatment are crucial. This study aims to create personalized predictive models by combining machine learning technology with demographic, medical history, and non-invasive biomarker data. These models will enhance the decision-making capabilities of clinical doctors for liver cell carcinoma (HCC) in HBV-related cirrhosis patients with low levels of serum alpha-fetoprotein (AFP).
Methods: A total of 6,980 patients were included for further analysis treated between January 2012 and December 2018 were assessed. The laboratory test and clinical data before treatment were gathered. The significant risk factors were selected, and the relative risk of each variable affecting HCC diagnosis was calculated with machine learning and univariate regression analysis. Finally, in order to establish machine learning models, the data set was partitioned into a validation set (20%) and training set (80%) at random.
Results:.This study identified 12 independent risk factors for HCC by using Gaussian naïve Bayes (GNB), extreme gradient boosting (XGBoost), random forest (RF), and least absolute shrinkage and selection operation (LASSO) regression models. Multivariate analysis showed that males, age >60 years, alkaline phosphate (ALP) >150 U/L, AFP >25 ng/mL, carcinoembryonic antigen (CEA) >5 ng/mL, and fibrinogen (Fbg) >4 g/L were risk factors, while hypertension, calcium <2.25 mmol/L, potassium ≤3.5 mmol/L, direct bilirubin (DB) >6.8 μmol/L, hemoglobin (HB) <110 g/L, and glutamic-pyruvic transaminase (GPT) >40 U/L were protective factors in HCC patients. Based on these factors, a nomogram was constructed and showed an area under the curve (AUC) of 0.746 (sensitivity=0.710, specificity=0.646), which was significantly higher than AFP AUC of 0.658 (sensitivity=0.462, specificity=0.766). Compared with several machine learning algorithms, XGBoost model had an AUC of 0.832 (sensitivity=0.745, specificity=0.766) and independent validation AUC of 0.829 (sensitivity=0.766, specificity=0.737), which performed the highest level in both the test set and the training set.
Conclusions: The proposed XGBoost for classifying HCC in patients with HBV-related cirrhosis with low-level AFP demonstrated promising ability for individualized prediction of HCC cases.