2020
DOI: 10.1609/aaai.v34i07.6822
|View full text |Cite
|
Sign up to set email alerts
|

Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization

Abstract: Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most of the existing attention models perform poorly in FGVC, due to the pivotal limitations in discriminative regions proposing and region-based feature learning. 1) The discriminative regions are predominantly located based on the filter responses over the images, which can not be directly optimized with a performance metric. 2) Existing methods train the region-based feature ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
75
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 139 publications
(75 citation statements)
references
References 22 publications
0
75
0
Order By: Relevance
“…Backbone Accuracy ResNet50 [13] ResNet50 84.5 GP-256 [31] VGG16 85.8 MaxEnt [8] DenseNet161 86.6 DFL-CNN [29] ResNet50 87.4 NTS-Net [34] ResNet50 87.5 Cross-X [20] ResNet50 87.7 DCL [3] ResNet50 87.8 CIN [10] ResNet101 88.1 DBTNet [37] ResNet101 88.1 ACNet [15] ResNet50 88.1 S3N [5] ResNet50 88.5 FDL [19] DenseNet161 89.1 PMG [7] ResNet50 89.6 API-Net [39] DenseNet161 90.0 StackedLSTM [11] GoogleNet 90.4 ViT [6] ViT-B_16 90.8 TransFG [12] ViT-B_16 91.7 FFVT ViT-B_16 91.6…”
Section: Methodsmentioning
confidence: 99%
“…Backbone Accuracy ResNet50 [13] ResNet50 84.5 GP-256 [31] VGG16 85.8 MaxEnt [8] DenseNet161 86.6 DFL-CNN [29] ResNet50 87.4 NTS-Net [34] ResNet50 87.5 Cross-X [20] ResNet50 87.7 DCL [3] ResNet50 87.8 CIN [10] ResNet101 88.1 DBTNet [37] ResNet101 88.1 ACNet [15] ResNet50 88.1 S3N [5] ResNet50 88.5 FDL [19] DenseNet161 89.1 PMG [7] ResNet50 89.6 API-Net [39] DenseNet161 90.0 StackedLSTM [11] GoogleNet 90.4 ViT [6] ViT-B_16 90.8 TransFG [12] ViT-B_16 91.7 FFVT ViT-B_16 91.6…”
Section: Methodsmentioning
confidence: 99%
“…Recently, several studies 19–21 use only image labels to locate corresponding parts to compare their appearance, without compromising performance. This extensively alleviates the limitations of annotation.…”
Section: Related Workmentioning
confidence: 99%
“…Backbone Acc. (%) ResNet-50 [17] ResNet-50 84.5 RA-CNN [12] VGG-19 85.3 GP-256 [37] VGG-16 85.8 MaxExt [11] DenseNet-161 86.6 DFL-CNN [34] ResNet-50 87.4 NTS-Net [39] ResNet-50 87.5 Cross-X [26] ResNet-50 87.7 DCL [4] ResNet-50 87.8 CIN [14] ResNet-101 88.1 DBTNet [42] ResNet-101 88.1 ACNet [21] ResNet-50 88.1 S3N [8] ResNet-50 88.5 FDL [25] DenseNet-161 89.1 PMG [10] ResNet-50 89.6 API-Net [47] DenseNet-161 90.0 StackedLSTM [15] GoogleNet 90.4 MMAL-Net [40] ResNet-50 89.6 ViT [9] ViT-B_16 90.6 TransFG & PSM [16] ViT-B_16 90.9…”
Section: Methodsmentioning
confidence: 99%