Background: Early identification of patients at high risk for psychological distress allows for timely intervention and improved prognosis. Current methods for predicting psychological distress in lung cancer patients using readily available data are limited.
Objective: This study aimed to develop a robust machine learning (ML) model for predicting psychological distress risk in lung cancer patients.
Methods: A cross-sectional study was designed to collect data from 342 lung cancer patients. Least Absolute Shrinkage and Selection Operator (LASSO) was used for feature selection. Model training and validation were conducted with bootstrap resampling method. Five-fold cross-validation evaluated and optimized the model with parameter tuning. Feature importance was assessed using SHapley additive exPlanations (SHAP) method.
Results: Seven independent predictors emerged as the most valuable features. AUROC values ranged from 0.749 to 1.000 across the eight ML algorithms. The extreme gradient boosting (XGBoost) algorithm achieved the best performance, with AUROC values of 0.988, 0.945, and 0.922 in the training, validation, and test sets, respectively. SHAP analysis elucidated the model’s explanatory variables and their contributions to psychological distress risk. A web-based tool for calculating psychological distress risk was developed.
Limitations
The results of this cross-sectional study may limit causal inference and introduce selection bias. Some important variables were excluded, such as mindfulness. The model’s robustness might be compromised by the absence of external validation.
Conclusion: The XGBoost classifier demonstrates exceptional performance, and clinical implementation of the web-based risk calculator can serve as an easy-to-use tool for health practitioners to formulate early prevention and intervention strategies.