2021
DOI: 10.48550/arxiv.2112.02841
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GETAM: Gradient-weighted Element-wise Transformer Attention Map for Weakly-supervised Semantic segmentation

Abstract: Weakly-supervised semantic segmentation (WSSS) is challenging, particularly when image-level labels are used to supervise pixel-level prediction. To bridge their gap, a Class Activation Map (CAM) is usually generated to provide pixel-level pseudo labels. CAMs in Convolutional Neural Networks suffer from partial activation i.e. only the most discriminative regions are activated. Transformerbased methods, on the other hand, are highly effective at exploring global context, potentially alleviating the "partial ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 38 publications
0
3
0
Order By: Relevance
“…Inspired by the popular Grad-CAM technique (Selvaraju et al, 2017), we incorporated head-wise gradient-weighing for self-attention maps to boost the presentation of the weights relevant to specific class activation for the first time. Compared with other attention mapping techniques (Chefer et al, 2021;Sun et al, 2021) that relied on the ViT, we were also the first to implement it on the more complex Swin transformer that was intended to improve upon the ViT. The enhanced visualization of the attention maps and ICH segmentation accuracy are evident in Fig.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by the popular Grad-CAM technique (Selvaraju et al, 2017), we incorporated head-wise gradient-weighing for self-attention maps to boost the presentation of the weights relevant to specific class activation for the first time. Compared with other attention mapping techniques (Chefer et al, 2021;Sun et al, 2021) that relied on the ViT, we were also the first to implement it on the more complex Swin transformer that was intended to improve upon the ViT. The enhanced visualization of the attention maps and ICH segmentation accuracy are evident in Fig.…”
Section: Discussionmentioning
confidence: 99%
“…For instance, Chefer et al (2021) utilize the Taylor Decomposition principle to assign and propagate a local relevance score through the layers of a ViT model. Similarly, Sun et al (2021) and Barkan et al (2021) employ attention gradient weighting on ViT and BERT models, respectively. However, these approaches primarily focused on the attention weight of the "cls" token, and the latter two methods weighed each token's attention weight through element-wise multiplication.…”
Section: Layer Attention Map Generationmentioning
confidence: 99%
“…Inspired by the popular Grad-CAM technique (Selvaraju et al, 2017), we incorporated head-wise gradient-weighing for self-attention maps to boost the presentation of the weights relevant to specific class activation for the first time. Compared with other attention mapping technique (Chefer et al, 2021;Sun et al, 2021) that relied on the ViT, we were also the first to implement it on the more complex Swin transformer that was intended to improve upon the ViT. The enhanced visualization of the attention maps and ICH segmentation accuracy are evident in Fig.…”
Section: Discussionmentioning
confidence: 99%