IntroductionUltrasound is instrumental in the early detection of thyroid nodules, which is crucial for appropriate management and favorable outcomes. However, there is a lack of clinical guidelines for the judicious use of thyroid ultrasonography in routine screening. Machine learning (ML) has been increasingly used on big data to predict clinical outcomes. This study aims to leverage the ML approach in assessing the risk of thyroid nodules based on common clinical features.MethodsData were sourced from a Chinese cohort undergoing routine physical examinations including thyroid ultrasonography between 2013 and 2023. Models were established to predict the 3-year risk of thyroid nodules based on patients’ baseline characteristics and laboratory tests. Four ML algorithms, including logistic regression, random forest, extreme gradient boosting, and light gradient boosting machine, were trained and tested using fivefold cross-validation. The importance of each feature was measured by the permutation score. A nomogram was established to facilitate risk assessment in the clinical settings.ResultsThe final dataset comprised 4,386 eligible subjects. Thyroid nodules were detected in 54.8% (n=2,404) individuals within the 3-year observation period. All ML models significantly outperformed the baseline regression model, successfully predicting the occurrence of thyroid nodules in approximately two-thirds of individuals. Age, high-density lipoprotein, fasting blood glucose and creatinine levels exhibited the highest impact on the outcome in these models. The nomogram showed consistency and validity, providing greater net benefits for clinical decision-making than other strategies.ConclusionThis study demonstrates the viability of an ML-based approach in predicting the occurrence of thyroid nodules. The findings highlight the potential of ML models in identifying high-risk individuals for personalized screening, thereby guiding the judicious use of ultrasound in this context.