2022
DOI: 10.1109/tpami.2021.3126668
|View full text |Cite
|
Sign up to set email alerts
|

Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 66 publications
(8 citation statements)
references
References 39 publications
0
8
0
Order By: Relevance
“…ViT [42] ViT-B_16 448 × 448 90.8 TransIFC [65] ViT-B_16 448 × 448 91.0 TransFG [44] ViT-B_16 448 × 448 91.1 TPSKG [46] ViT-B_16 448 × 448 91.3 RAMS-Trans [28] ViT-B_16 448 × 448 91.3 FFVT [45] ViT-B_16 448 × 448 91.4 DCAL [27] ViT-B_16 448 × 448 91.4 SIM-Trans [25] ViT-B_16 448 × 448 91.5 AFTrans [26] ViT-B_16 448 × 448 91.5 IELT [24] ViT The state-of-the-art methods at this stage are organized in Table 8. We can see that our model obtains 1.7% and 1.0% improvements compared to the state-of-the-art CNNbased model PRIS [66] and ViT [42], respectively, which are higher than the results for CUB-200-2011, indicating that our method does not fail due to the increase in the amount of data. However, compared to the state-of-the-art ViT-based method TPSKG [46], which uses dual backbone network forward propagation during the training process, resulting in a significant increase in computational complexity during the training phase, our method has only a performance gap of 0.1% with less computational complexity and achieves the same performance as the RAMS-Trans [28] method, which uses dual backbone networks for both training and inference.…”
Section: Ablation Experiments and Analysismentioning
confidence: 63%
“…ViT [42] ViT-B_16 448 × 448 90.8 TransIFC [65] ViT-B_16 448 × 448 91.0 TransFG [44] ViT-B_16 448 × 448 91.1 TPSKG [46] ViT-B_16 448 × 448 91.3 RAMS-Trans [28] ViT-B_16 448 × 448 91.3 FFVT [45] ViT-B_16 448 × 448 91.4 DCAL [27] ViT-B_16 448 × 448 91.4 SIM-Trans [25] ViT-B_16 448 × 448 91.5 AFTrans [26] ViT-B_16 448 × 448 91.5 IELT [24] ViT The state-of-the-art methods at this stage are organized in Table 8. We can see that our model obtains 1.7% and 1.0% improvements compared to the state-of-the-art CNNbased model PRIS [66] and ViT [42], respectively, which are higher than the results for CUB-200-2011, indicating that our method does not fail due to the increase in the amount of data. However, compared to the state-of-the-art ViT-based method TPSKG [46], which uses dual backbone network forward propagation during the training process, resulting in a significant increase in computational complexity during the training phase, our method has only a performance gap of 0.1% with less computational complexity and achieves the same performance as the RAMS-Trans [28] method, which uses dual backbone networks for both training and inference.…”
Section: Ablation Experiments and Analysismentioning
confidence: 63%
“…Classification tasks using datasets with fine- grained images can be particularly challenging, and the effects of pre-training with ImageNet were deemed to be small. Therefore, it is essential to evaluate the use of specific refinement techniques for fine-grained images (Du et al, 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, our method is a single-stage model with lower training complexity, while TPSKG is a two-stage model, which further demonstrates the effectiveness of our proposed method. 86.5 85.2 SEF [28] 87.3 88.8 Cross-X [29] 87.7 88.9 FDL [30] 89.1 84.9 FBSD [31] 89.8 89.4 API-NET [13] 90.0 90.3 PMG-V2 [19] 90.0 90.7…”
Section: Comparison With the State-of-the-artmentioning
confidence: 99%
“…In recent years, a growing number of models and methods have been proposed for FGIR. Du et al [ 19 ] fed a pair of images of the same category into the network and extracted the feature maps at different stages of the network. Based on the comparison between the feature maps of the same category at different stages, they proposed a category-consistency constraint to supervise the network to learn the most discriminative features within a category.…”
Section: Related Workmentioning
confidence: 99%