Objective
Indeterminate thyroid nodules (ITN) are common and often lead to (sometimes unnecessary) diagnostic surgery. We aimed to evaluate the performance of two machine learning methods (ML), based on routinely available features to predict the risk of malignancy (RM) of ITN.
Design
Multicentric diagnostic retrospective cohort study conducted between 2010 and 2020.
Methods
Adult patients who underwent surgery for at least one Bethesda III-V thyroid nodule (TN) with fully available medical records were included. Of the 7,917 records reviewed, eligibility criteria were met in 1,288 patients with 1,335 TN. Patients were divided in a training (940 TN) and validation cohort (395 TN). The diagnostic performance of a multivariate logistic regression model (LR) and its nomogram, and a random forest model (RF) in predicting the nature and RM of a TN were evaluated. All available clinical, biological, ultrasound, and cytological data of the patients were collected and used to construct the two algorithms.
Results
There were 253 (19%), 693 (52%) and 389 (29%) TN classified as Bethesda III, IV and V respectively, with an overall RM of 35%. Both cohorts were well balanced for baseline characteristics. Both models were validated on the validation cohort, with performances in terms of specificity, sensitivity, positive predictive value, negative predictive value and area under the receiver operating characteristic curve of 90%, 57.3%, 73.4%, 81.4%, 84% (CI95%: 78.5–89.5%) for the LR model, and 87.6%, 54.7%, 68.1%, 80%, 82.6% (CI95%: 77.4–87.9%) for the RF model, respectively.
Conclusions
Our ML models performed well in predicting the nature of Bethesda III-V TN. In addition, our freely available online nomogram helped to refine the RM, identifying low-risk TN that may benefit from surveillance in up to a third of ITN, and thus may reduce the number of unnecessary surgeries.