ObjectivesTo apply machine learning to extract radiomics features from thyroid two-dimensional ultrasound (2D-US) combined with contrast-enhanced ultrasound (CEUS) images to classify and predict benign and malignant thyroid nodules, classified according to the Chinese version of the thyroid imaging reporting and data system (C-TIRADS) as category 4.Materials and methodsThis retrospective study included 313 pathologically diagnosed thyroid nodules (203 malignant and 110 benign). Two 2D-US images and five CEUS key frames (“2nd second after the arrival time” frame, “time to peak” frame, “2nd second after peak” frame, “first-flash” frame, and “second-flash” frame) were selected to manually label the region of interest using the “Labelme” tool. A total of 7 images of each nodule and their annotates were imported into the Darwin Research Platform for radiomics analysis. The datasets were randomly split into training and test cohorts in a 9:1 ratio. Six classifiers, namely, support vector machine, logistic regression, decision tree, random forest (RF), gradient boosting decision tree and extreme gradient boosting, were used to construct and test the models. Performance was evaluated using a receiver operating characteristic curve analysis. The area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy (ACC), and F1-score were calculated. One junior radiologist and one senior radiologist reviewed the 2D-US image and CEUS videos of each nodule and made a diagnosis. We then compared their AUC and ACC with those of our best model.ResultsThe AUC of the diagnosis of US, CEUS and US combined CEUS by junior radiologist and senior radiologist were 0.755, 0.750, 0.784, 0.800, 0.873, 0.890, respectively. The RF classifier performed better than the other five, with an AUC of 1 for the training cohort and 0.94 (95% confidence interval 0.88–1) for the test cohort. The sensitivity, specificity, accuracy, PPV, NPV, and F1-score of the RF model in the test cohort were 0.82, 0.93, 0.90, 0.85, 0.92, and 0.84, respectively. The RF model with 2D-US combined with CEUS key frames achieved equivalent performance as the senior radiologist (AUC: 0.94 vs. 0.92, P = 0.798; ACC: 0.90 vs. 0.92) and outperformed the junior radiologist (AUC: 0.94 vs. 0.80, P = 0.039, ACC: 0.90 vs. 0.81) in the test cohort.ConclusionsOur model, based on 2D-US and CEUS key frames radiomics features, had good diagnostic efficacy for thyroid nodules, which are classified as C-TIRADS 4. It shows promising potential in assisting less experienced junior radiologists.