2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00288
|View full text |Cite
|
Sign up to set email alerts
|

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

Abstract: Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and expe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
94
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 158 publications
(96 citation statements)
references
References 52 publications
1
94
1
Order By: Relevance
“…We expect to convert tokens into activation maps for each labelclass (i.e. aware of semantic meaning [16]). For strong-label datasets, we can let the model directly calculate the loss in specific time ranges.…”
Section: Token Semantic Modulementioning
confidence: 99%
See 1 more Smart Citation
“…We expect to convert tokens into activation maps for each labelclass (i.e. aware of semantic meaning [16]). For strong-label datasets, we can let the model directly calculate the loss in specific time ranges.…”
Section: Token Semantic Modulementioning
confidence: 99%
“…This inspires us to design a module that makes every output token of an audio transformer aware of the semantic meaning of events (i.e. a token-semantic module [16]) for supporting more audio tasks (e.g. sound event detection and localization).…”
Section: Introductionmentioning
confidence: 99%
“…Similar to [8] Baseline models. To validate our F-CAM method, we compare with recent WSOL methods, including: CAM [57], HaS [34], ACoL [53], SPG [54], ADL [9], CutMix [51], CSTN [22], TS-CAM [13], MEIL [21], DANet [47], SPOL [44], ICL [17], NL-CCAM [49], I 2 C [55], RCAM [56], GC-Net [20], ADL-TAP [1], GradCAM [32], Grad-Cam++ [7], Smooth-GradCAM++ [25], XGradCAM [12], LayerCAM [15]. For CAM, HaS, ACoL, SPG, ADL, and CutMix, we present the results reported in [8].…”
Section: Implementation Detailsmentioning
confidence: 99%
“…2) Affordance Detection from Machine Learning Perspectives Weakly Supervised, Semi-Supervised, and Unsupervised Affordance Detection: Affor-dance detection based on supervised learning usually requires large-scale labeled data with pixellevel accurate annotations for training, which are labor-intensive to collect and annotate. Alternatively, weakly supervised, semi-supervised, and unsupervised learning methods are also worth further study for affordance detection (Gao et al 2021;Nagarajan et al 2019;Pan et al 2021;Wang et al 2021b).…”
Section: ) Multimodal Affordance Detectionmentioning
confidence: 99%