Objective:
This study aimed to construct a two-stage deep learning framework to segment and recognize tongue images and enhance the accuracy and efficiency of artificial intelligence (AI) tongue diagnosis in traditional Chinese medicine (TCM).
Materials and Methods:
Five hundred and ninety-four tongue images of adequate quality were used to construct AI models. First, a multi-attention UNet model was used for semantic segmentation to distinguish the tongue body from the background. In the second stage, a residual network was employed to classify seven important tongue characteristics.
Results:
The segmentation model achieved 96.12% mean intersection over union, 98.91% mean pixel accuracy, and 97.15% mean precision. The classification models exhibited robustness across seven distinct characteristics with an overall accuracy >80%. These results indicated that the constructed models have potential applications in TCM.
Conclusions:
This two-stage approach not only streamlines the analysis of tongue images but also sets a new benchmark for accuracy in medical image processing in the field.