Background
Prediction models with high accuracy rates for nonmetastatic cervical cancer (CC) patients are limited. This study aimed to construct and compare predictive models on the basis of machine learning (ML) algorithms for predicting the 5‐year survival status of CC patients through using the Surveillance, Epidemiology, and End Results public database of the National Cancer Institute.
Methods
The data registered from 2004 to 2016 were extracted and randomly divided into training and validation cohorts (8:2). The least absolute shrinkage and selection operator (LASSO) regression was employed to identify significant factors. Then, four predictive models were constructed, including logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost). The predictive models were evaluated and compared using Receiver‐operating characteristics with areas under the curves (AUCs) and decision curve analysis (DCA), respectively.
Results
A total of 13,802 patients were involved and classified into training (
N
= 11,041) and validation (
N
= 2761) cohorts. By using the LASSO regression method, seven factors were identified. In the training cohort, the XGBoost model showed the best performance (AUC = 0.8400) compared to the other three models (all
p
< 0.05 by Delong's test). In the validation cohort, the XGBoost model also demonstrated a superior prediction ability (AUC = 0.8365) than LR and SVM models (both
p
< 0.05 by Delong's test), although the difference was not statistically significant between the XGBoost and the RF models (
p
= 0.4251 by Delong's test). Based on the DCA results, the XGBoost model was also superior, and feature importance analysis indicated that the tumor stage was the most important variable among the seven factors.
Conclusions
The XGBoost model proved to be an effective algorithm with better prediction abilities. This model is proposed to support better decision‐making for nonmetastatic CC patients in the future.