2021
DOI: 10.48550/arxiv.2107.10834
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Query2Label: A Simple Transformer Way to Multi-Label Classification

Shilong Liu,
Lei Zhang,
Xiao Yang
et al.

Abstract: This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0
4

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(58 citation statements)
references
References 32 publications
0
54
0
4
Order By: Relevance
“…[48] suggested simple spatial attention scores, and then combined them with class-agnostic average pooling features. [24] presented a pooling transformer with learnable queries for multi-label classification, achieving top results.…”
Section: Equal Contributionmentioning
confidence: 99%
See 3 more Smart Citations
“…[48] suggested simple spatial attention scores, and then combined them with class-agnostic average pooling features. [24] presented a pooling transformer with learnable queries for multi-label classification, achieving top results.…”
Section: Equal Contributionmentioning
confidence: 99%
“…GAP was also adopted as a baseline approach for mutli-label classification [3,25,38] Attention-based: Unlike single-label classification, in multi-label classification several objects can appear in the image, in different locations and sizes. Several works [14,24,48] have noticed that the GAP operation, which eliminates the spatial dimension via simple averaging, can be sub-optimal for identifying multiple objects with different sizes. Instead they suggested using attention-based classification heads, which enable more elaborate usage of the spatial data, with improved results.…”
Section: Baseline Classification Headsmentioning
confidence: 99%
See 2 more Smart Citations
“…Evaluation Metrics. Following previous works [15,38,42], beside the mean average precision (mAP), we employ several metrics to better demonstrate the performance of the proposed approach. Under the premise that the predicted label is positive, if the output probability is greater than a threshold (e.g., 0.5), we report the average per-class precision (CP), recall (CR), and F1 score (CF1).…”
Section: Multi-label Classificationmentioning
confidence: 99%