2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00515
|View full text |Cite
|
Sign up to set email alerts
|

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition

Abstract: Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which often suffer from a limited number of parts and heavy computational cost. In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. Specificall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
269
0
3

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 429 publications
(274 citation statements)
references
References 32 publications
2
269
0
3
Order By: Relevance
“…In this section, we comprehensively analyze and evaluate VTNs on two tasks: fine-grained image recognition and instance-level image retrieval. First, we analyze the influence of the different components of VTNs compared to existing spatial deformation modeling methods [28,10,48,73] and the impact of combining VTNs with different backbone networks [55,21] and second-order pooling strategies [38,17,9,36,35]. Second, we compare VTNs with the state-of-the-art methods on fine-grained image recognition benchmarks [64,32,40,22].…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…In this section, we comprehensively analyze and evaluate VTNs on two tasks: fine-grained image recognition and instance-level image retrieval. First, we analyze the influence of the different components of VTNs compared to existing spatial deformation modeling methods [28,10,48,73] and the impact of combining VTNs with different backbone networks [55,21] and second-order pooling strategies [38,17,9,36,35]. Second, we compare VTNs with the state-of-the-art methods on fine-grained image recognition benchmarks [64,32,40,22].…”
Section: Methodsmentioning
confidence: 99%
“…Analysis of the VTN components. To validate the different components of our VTNs, we compare them with previous spatial deformation modeling methods, such as STNs [28], deformable convolution (Def-Conv) [10], saliency-based sampler (SSN) [48], and attention-based sampler (ASN) [73] on fine-grained image recognition benchmarks, such as CUB-Birds [64], Stanford-Cars [32], and FGVC-Aircraft [40]. For the comparison to be fair, we apply these methods at the same layer as ours, i.e., the last convolutional layer.…”
Section: Fine-grained Image Recognitionmentioning
confidence: 99%
See 3 more Smart Citations