Distant metastasis of thyroid cancer often indicates poor prognosis, and it is important to identify patients who have developed distant metastasis or are at high risk as early as possible. This paper aims to predict distant metastasis of thyroid cancer through the construction of machine learning models to provide a reference for clinical diagnosis and treatment.
Materials & MethodsData on demographic and clinicopathological characteristics of thyroid cancer patients between 2010 and 2015 were extracted from the National Institutes of Health (NIH) Surveillance, Epidemiology, and End Results (SEER) database. Our research used univariate and multivariate logistic models to screen independent risk factors, respectively. Decision Trees (DT), ElasticNet (ENET), Logistic Regression (LR), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Multilayer Perceptron (MLP), Radial Basis Function Support Vector Machine (RBFSVM) and other seven machine learning models were compared and evaluated by the following metrics: The area under receiver operating characteristic curve (AUC), calibration curve, decision curve analysis (DCA), sensitivity(also called recall), speci city, precision, accuracy, F1 score. Interpretable machine learning was used to identify possible correlation between variables and distant metastasis.
ResultsIndependent risk factors for distant metastasis, including age, gender, race, marital status, histological type, capsular invasion, and number of lymph nodes metastases were screened by multifactorial regression analysis. Among the seven machine learning algorithms, RF was the best algorithm, with an AUC of 0.948, sensitivity of 0.919, accuracy of 0.845, and F1 score of 0.886 in the training set, and an AUC of 0.960, sensitivity of 0.929, accuracy of 0.906, and F1 score of 0.908 in the test set.
ConclusionsThe machine learning model constructed in this study helps in the early diagnosis of distant thyroid metastases and helps physicians to make better decisions and medical interventions.