Background
Breast cancer is the most common cancer and the most common cause of cancer death in women. Although survival rates have improved, unmet psychosocial needs remain challenging because the quality of life (QoL) and QoL-related factors change over time. In addition, traditional statistical models have limitations in identifying factors associated with QoL over time, particularly concerning the physical, psychological, economic, spiritual, and social dimensions.
Objective
This study aimed to identify patient-centered factors associated with QoL among patients with breast cancer using a machine learning (ML) algorithm to analyze data collected along different survivorship trajectories.
Methods
The study used 2 data sets. The first data set was the cross-sectional survey data from the Breast Cancer Information Grand Round for Survivorship (BIG-S) study, which recruited consecutive breast cancer survivors who visited the outpatient breast cancer clinic at the Samsung Medical Center in Seoul, Korea, between 2018 and 2019. The second data set was the longitudinal cohort data from the Beauty Education for Distressed Breast Cancer (BEST) cohort study, which was conducted at 2 university-based cancer hospitals in Seoul, Korea, between 2011 and 2016. QoL was measured using European Organization for Research and Treatment of Cancer QoL Questionnaire Core 30 questionnaire. Feature importance was interpreted using Shapley Additive Explanations (SHAP). The final model was selected based on the highest mean area under the receiver operating characteristic curve (AUC). The analyses were performed using the Python 3.7 programming environment (Python Software Foundation).
Results
The study included 6265 breast cancer survivors in the training data set and 432 patients in the validation set. The mean age was 50.6 (SD 8.66) years and 46.8% (n=2004) had stage 1 cancer. In the training data set, 48.3% (n=3026) of survivors had poor QoL. The study developed ML models for QoL prediction based on 6 algorithms. Performance was good for all survival trajectories: overall (AUC 0.823), baseline (AUC 0.835), within 1 year (AUC 0.860), between 2 and 3 years (AUC 0.808), between 3 and 4 years (AUC 0.820), and between 4 and 5 years (AUC 0.826). Emotional and physical functions were the most important features before surgery and within 1 year after surgery, respectively. Fatigue was the most important feature between 1 and 4 years. Despite the survival period, hopefulness was the most influential feature on QoL. External validation of the models showed good performance with AUCs between 0.770 and 0.862.
Conclusions
The study identified important factors associated with QoL among breast cancer survivors across different survival trajectories. Understanding the changing trends of these factors could help to intervene more precisely and timely, and potentially prevent or alleviate QoL-related issues for patients. The good performance of our ML models in both training and external validation sets suggests the potential use of this approach in identifying patient-centered factors and improving survivorship care.