Objective
The aim of this study was to establish an ensemble learning model based on clinicopathological parameter and ultrasound radomics for assessing the risk of lateral cervical lymph node with short diameter less than 8 mm (small lymph nodes were used instead) metastasis in patients with papillary thyroid cancer (PTC), thereby guiding the selection of surgical methods.
Methods
This retrospective analysis was conducted on 454 patients diagnosed with papillary thyroid carcinoma who underwent total thyroidectomy and lateral neck lymph node dissection or lymph node intraoperative frozen section biopsy at the First Hospital of China Medical University between January 2015 and April 2022. In a ratio of 8:2, 362(80%) patients were assigned to the training set and 92(20%) patients were assigned to the test set. Clinical pathological features and radomics features related to ultrasound imaging were extracted, followed by feature selection using recursive feature elimination (RFE). Based on distinct feature sets, we constructed ensemble learning models comprising random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), gradient boosting decision tree (GBDT), and light gradient boosting machine (Lightgbm) to develop clinical models, radiomics models, and clinical-radiomic models. Through the comparison of performance metrics such as area under curve (AUC), accuracy (ACC), specificity (SPE), precision (PRE), recall rate, F1 score, mean squared error (MSE) etc., we identified the optimal model and visualized its results using shapley additive exPlanations (SHAP).
Results
In this study, a total of 454 patients were included, among whom 342 PTC patients had small lymph node metastasis in the lateral neck region, while 112 did not have any metastasis. A total of 1035 features were initially considered for inclusion in this study, which were then narrowed down to 10 clinical features, 8 radiomics features, and 17 combined clinical-omics features. Based on these three feature sets, a total of fifteen ensemble learning models were established. In the test set, RF model in the clinical model is outperforms other models (AUC = 0.72, F1 = 0.75, Jaccard = 0.60 and Recall = 0.84), while CatBoost model in the radiomics model is superior to other models (AUC = 0.91, BA = 0.83 and SPE = 0.76). Among the clinical-radiomic models, Catboost exhibits optimal performance (AUC = 0.93, ACC = 0.88, BA = 0.87, F1 = 0.91, SPE = 0.83, PRE = 0.88, Jaccard = 0.83 and Recall = 0.92). Using the SHAP algorithm to visualize the operation process of the clinical-omics CatBoost model, we found that clinical omics features such as central lymph node metastasis (CLNM), Origin_Shape_Sphericity (o_shap_sphericity), LoG-sigma3_first order_ Skewness (log-3_fo_skewness), wavelet-HH_first order_Skewness (w-HH_fo_skewness) and wavelet-HH_first order_Skewness (sqr_gldm_DNUN) had the greatest impact on predicting the presence of lateral cervical small lymph node metastasis in PTC patients.
Conclusions
(1) In this study, among the ensemble learning models established based on clinicopathological features and radiomics features for predicting PTC lateral small lymph node metastasis, the clinical-radiomic CatBoost model has the best performance. (2) SHAP can visualize how the clinical and radiomics features affect the results and realize the interpretation of the model. (3) The combined CatBoost model can improve the diagnostic accuracy of suspicious lymph nodes with short diameter < 8 mm that are difficult to obtain accurate puncture results. The combined application of radiomics features is more accurate and reasonable than the prediction of clinical data alone, which helps to accurately evaluate the surgical scope and provide support for individual clinical decision making.