2021
DOI: 10.48550/arxiv.2107.06538
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition

Xinda Liu,
Lili Wang,
Xiaoguang Han

Abstract: Fine-grained image recognition is challenging because discriminative clues are usually fragmented, whether from a single image or multiple images. Despite their significant improvements, the majority of existing methods still focus on the most discriminative parts from a single image, ignoring informative details in other regions and lacking consideration of clues from other associated images. In this paper, we analyze the difficulties of fine-grained image recognition from a new perspective and propose a tran… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 45 publications
0
7
0
Order By: Relevance
“…For example, TransFG [16] proposes the first ViT-based fine-grained classification method, in which a part selection module is designed to select discriminative tokens. Following that, several methods [19,29,39] also apply ViT to FGVC. Thereinto, TPSKG [29] proposes a peak suppression module which penalizes the attention to the most discriminative part and a knowledge guidance module to obtain the knowledge response coefficients.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…For example, TransFG [16] proposes the first ViT-based fine-grained classification method, in which a part selection module is designed to select discriminative tokens. Following that, several methods [19,29,39] also apply ViT to FGVC. Thereinto, TPSKG [29] proposes a peak suppression module which penalizes the attention to the most discriminative part and a knowledge guidance module to obtain the knowledge response coefficients.…”
Section: Related Workmentioning
confidence: 99%
“…Following that, several methods [19,29,39] also apply ViT to FGVC. Thereinto, TPSKG [29] proposes a peak suppression module which penalizes the attention to the most discriminative part and a knowledge guidance module to obtain the knowledge response coefficients. RAMS-Trans [19] learns discriminative region attention in a multi-scale way.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Other researches focus on extracting more useful features from multi-channel networks [5,57] or contrastive learning [1,13]. TransFG [17] and TPSKG [27] have recently used the Transformer architecture to improve classification performance. The fine-grained methods generally suffer from a complex pipeline and enormous manual design.…”
Section: Related Workmentioning
confidence: 99%