2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01058
|View full text |Cite
|
Sign up to set email alerts
|

Shunted Self-Attention via Multi-Scale Token Aggregation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
60
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 201 publications
(60 citation statements)
references
References 14 publications
0
60
0
Order By: Relevance
“…Here, we introduce a local context enhancement term LCE((V) as in [37]. Function LCE(•) is parametrized with a depth-wise convolution, and we set the kernel size to 5.…”
Section: Bi-level Routing Attention (Bra)mentioning
confidence: 99%
“…Here, we introduce a local context enhancement term LCE((V) as in [37]. Function LCE(•) is parametrized with a depth-wise convolution, and we set the kernel size to 5.…”
Section: Bi-level Routing Attention (Bra)mentioning
confidence: 99%
“…As a common sense in Transformer, we propose three variants for SCAT: tiny (t), small (s) and base (b), in order to make a close model size (number of parameters) as in Shunted Transformer 7 . Note that all variants of SCAT have the same embedding dimension as and hidden dimension for the lightweight decoder.…”
Section: Model Configurationsmentioning
confidence: 99%
“…We set the batch size for training as 32 and train 160K iterations for all models. Focal loss 15 is utilized for the experiments as follows: (7) where denotes the ground truth semantic segmentation map, and denotes the focal loss function 15 with and as the predicted probability that belong to the true class. is a trade-off hyper-parameter for balancing the losses of the coarse segmentation map and the refined prediction .…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Visual attention has been widely used in deep learning and achieves remarkable advances [59,69]. It has been exploited in computer vision tasks such as image recognition [9,10,33,48,61,71] and object detection among others [6,16,35,66,67]. CAM [77] provides the attention visualization of feature maps for model interpretable analysis.…”
Section: Related Workmentioning
confidence: 99%