ObjectiveWe endeavor to develop a novel deep learning architecture tailored specifically for the analysis and classification of tongue features, including color, shape, and coating. Unlike conventional methods based on architectures like VGG or ResNet, our proposed method aims to address the challenges arising from their extensive size, thereby mitigating the overfitting problem. Through this research, we aim to contribute to the advancement of techniques in tongue feature recognition, ultimately leading to more precise diagnoses and better patient rehabilitation in Traditional Chinese Medicine (TCM).MethodsIn this study, we introduce TGANet (Tongue Feature Attention Network) to enhance model performance. TGANet utilizes the initial five convolutional blocks of pre-trained VGG16 as the backbone and integrates an attention mechanism into this backbone. The integration of the attention mechanism aims to mimic human cognitive attention, emphasizing model weights on pivotal regions of the image. During the learning process, the allocation of attention weights facilitates the interpretation of causal relationships in the model’s decision-making.ResultsExperimental results demonstrate that TGANet outperforms baseline models, including VGG16, ResNet18, and TSC-WNet, in terms of accuracy, precision, F1 score, and AUC metrics. Additionally, TGANet provides a more intuitive and meaningful understanding of tongue feature classification models through the visualization of attention weights.ConclusionIn conclusion, TGANet presents an effective approach to tongue feature classification, addressing challenges associated with model size and overfitting. By leveraging the attention mechanism and pre-trained VGG16 backbone, TGANet achieves superior performance metrics and enhances the interpretability of the model’s decision-making process. The visualization of attention weights contributes to a more intuitive understanding of the classification process, making TGANet a promising tool in tongue diagnosis and rehabilitation.