Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification

Du, Ruoyi; Xie, Jiyang; Ma, Zhanyu; Chang, Dongliang; Song, Yi-Zhe; Guo, Jun

doi:10.1109/tpami.2021.3126668

Cited by 66 publications

(8 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ViT [42] ViT-B_16 448 × 448 90.8 TransIFC [65] ViT-B_16 448 × 448 91.0 TransFG [44] ViT-B_16 448 × 448 91.1 TPSKG [46] ViT-B_16 448 × 448 91.3 RAMS-Trans [28] ViT-B_16 448 × 448 91.3 FFVT [45] ViT-B_16 448 × 448 91.4 DCAL [27] ViT-B_16 448 × 448 91.4 SIM-Trans [25] ViT-B_16 448 × 448 91.5 AFTrans [26] ViT-B_16 448 × 448 91.5 IELT [24] ViT The state-of-the-art methods at this stage are organized in Table 8. We can see that our model obtains 1.7% and 1.0% improvements compared to the state-of-the-art CNNbased model PRIS [66] and ViT [42], respectively, which are higher than the results for CUB-200-2011, indicating that our method does not fail due to the increase in the amount of data. However, compared to the state-of-the-art ViT-based method TPSKG [46], which uses dual backbone network forward propagation during the training process, resulting in a significant increase in computational complexity during the training phase, our method has only a performance gap of 0.1% with less computational complexity and achieves the same performance as the RAMS-Trans [28] method, which uses dual backbone networks for both training and inference.…”

Section: Ablation Experiments and Analysismentioning

confidence: 63%

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Cui,

Hui

2024

Sensors

View full text Add to dashboard Cite

Visual transformers (ViTs) are widely used in various visual tasks, such as fine-grained visual classification (FGVC). However, the self-attention mechanism, which is the core module of visual transformers, leads to quadratic computational and memory complexity. The sparse-attention and local-attention approaches currently used by most researchers are not suitable for FGVC tasks. These tasks require dense feature extraction and global dependency modeling. To address this challenge, we propose a dual-dependency attention transformer model. It decouples global token interactions into two paths. The first is a position-dependency attention pathway based on the intersection of two types of grouped attention. The second is a semantic dependency attention pathway based on dynamic central aggregation. This approach enhances the high-quality semantic modeling of discriminative cues while reducing the computational cost to linear computational complexity. In addition, we develop discriminative enhancement strategies. These strategies increase the sensitivity of high-confidence discriminative cue tracking with a knowledge-based representation approach. Experiments on three datasets, NABIRDS, CUB, and DOGS, show that the method is suitable for fine-grained image classification. It finds a balance between computational cost and performance.

show abstract

Section: Ablation Experiments and Analysismentioning

confidence: 63%

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Cui,

Hui

2024

Sensors

View full text Add to dashboard Cite

show abstract

“…Classification tasks using datasets with fine- grained images can be particularly challenging, and the effects of pre-training with ImageNet were deemed to be small. Therefore, it is essential to evaluate the use of specific refinement techniques for fine-grained images (Du et al, 2021).…”

Section: Discussionmentioning

confidence: 99%

Assessing the impact of data augmentation and a combination of CNNs on leukemia classification

Claro

Veras

Santana

et al. 2022

Information Sciences

View full text Add to dashboard Cite

“…Additionally, our method is a single-stage model with lower training complexity, while TPSKG is a two-stage model, which further demonstrates the effectiveness of our proposed method. 86.5 85.2 SEF [28] 87.3 88.8 Cross-X [29] 87.7 88.9 FDL [30] 89.1 84.9 FBSD [31] 89.8 89.4 API-NET [13] 90.0 90.3 PMG-V2 [19] 90.0 90.7…”

Section: Comparison With the State-of-the-artmentioning

confidence: 99%

“…In recent years, a growing number of models and methods have been proposed for FGIR. Du et al [ 19 ] fed a pair of images of the same category into the network and extracted the feature maps at different stages of the network. Based on the comparison between the feature maps of the same category at different stages, they proposed a category-consistency constraint to supervise the network to learn the most discriminative features within a category.…”

Section: Related Workmentioning

confidence: 99%

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Wang

2023

Entropy

View full text Add to dashboard Cite

Many current approaches for image classification concentrate solely on the most prominent features within an image, but in fine-grained image recognition, even subtle features can play a significant role in model classification. In addition, the large variations in the same class and small differences between different categories that are unique to fine-grained image recognition pose a great challenge for the model to extract discriminative features between different categories. Therefore, we aim to present two lightweight modules to help the network discover more detailed information in this paper. (1) Patches Hidden Integrator (PHI) module randomly selects patches from images and replaces them with patches from other images of the same class. It allows the network to glean diverse discriminative region information and prevent over-reliance on a single feature, which can lead to misclassification. Additionally, it does not increase the training time. (2) Consistency Feature Learning (CFL) aggregates patch tokens from the last layer, mining local feature information and fusing it with the class token for classification. CFL also utilizes inconsistency loss to force the network to learn common features in both tokens, thereby guiding the network to focus on salient regions. We conducted experiments on three datasets, CUB-200-2011, Stanford Dogs, and Oxford 102 Flowers. We achieved experimental results of 91.6%, 92.7%, and 99.5%, respectively, achieving a competitive performance compared to other works.

show abstract

Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification

Cited by 66 publications

References 39 publications

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Dual-Dependency Attention Transformer for Fine-Grained Visual Classification

Assessing the impact of data augmentation and a combination of CNNs on leukemia classification

Hybrid Granularities Transformer for Fine-Grained Image Recognition

Contact Info

Product

Resources

About