Objective To construct deep learning (DL) models to improve the accuracy and efficiency of thyroid disease diagnosis by thyroid scintigraphy. Methods We constructed DL models with AlexNet, VGGNet, and ResNet. The models were trained separately with transfer learning. We measured each model’s performance with six indicators: recall, precision, negative predictive value (NPV), specificity, accuracy, and F1-score. We also compared the diagnostic performances of first- and third-year nuclear medicine (NM) residents with assistance from the best-performing DL-based model. The Kappa coefficient and average classification time of each model were compared with those of two NM residents. Results The recall, precision, NPV, specificity, accuracy, and F1-score of the three models ranged from 73.33% to 97.00%. The Kappa coefficient of all three models was >0.710. All models performed better than the first-year NM resident but not as well as the third-year NM resident in terms of diagnostic ability. However, the ResNet model provided “diagnostic assistance” to the NM residents. The models provided results at speeds 400 to 600 times faster than the NM residents. Conclusion DL-based models perform well in diagnostic assessment by thyroid scintigraphy. These models may serve as tools for NM residents in the diagnosis of Graves’ disease and subacute thyroiditis.