Objective To explore the differential diagnostic efficiency of the residual network (ResNet)50, random forest (RF), and DS ensemble models for papillary thyroid carcinoma (PTC) and other pathological types of thyroid nodules. Methods This study retrospectively analyzed 559 patients with thyroid nodules and collected thyroid pathological images and auxiliary examination results (laboratory and ultrasound results) to construct datasets. The pathological image dataset was used to train a ResNet50 model, the text dataset was used to train a random forest (RF) model, and a DS ensemble model was constructed from the results of the two models. The differential diagnostic values of the three models for PTC and other types of thyroid nodules were then compared. Results The DS ensemble model had the highest sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (85.87%, 97.18%, 93.77%, and 0.982, respectively). Conclusions Compared with Resnet50 and the RF models trained only on imaging data or text information, respectively, the DS ensemble model showed better diagnostic value for PTC.