Background Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance. Objective This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression. Methods This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score. Results The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models’ 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer. Conclusions The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis.
BACKGROUND Over recent years, machine learning (ML) methods have been increasingly explored in cancer prognosis prediction because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines (SVM) for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or ML-based prognostic prediction models have better predictive performance. OBJECTIVE This study aims to use the machine learning algorithms to predict the survival of breast cancer and compare the predictive performance with the traditional Cox regression. METHODS This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center (FUSCC) between January 1, 2008 and December 31, 2016. A total of 25267 cases with 21 features were eligible for model development, and the data set was randomly split into a train set (70%) and a test set (30%) for developing four models and predicting overall survival in breast cancer patients. The discriminative ability of models was evaluated by the concordance index (C-index) and the time-dependent area under the curve (AUC); the calibration ability of models was evaluated by the Brier score. RESULTS The RSF model revealed the best discriminative performance among the four models with 3-year, 5-year and 10-year time-dependent AUC of 0.857, 0.838 and 0.781, respectively and C-index of 0.827 (0.809, 0.845), which significantly outperformed the Cox-EN model (0.816, p=0.007), the Cox model (0.814, p=0.003) and the SVM model (0.812, p<0.001). The four models' 3-year, 5-year, and 10-year brier scores were very close, ranging from 0.027 to 0.094, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of breast cancer patients. CONCLUSIONS RSF model slightly outperformed the other models on discriminative ability, revealing the great potential to be used as an effective approach for survival analysis. CLINICALTRIAL ClinicalTrials. gov, registration number: NCT04996732.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.