2019
DOI: 10.48550/arxiv.1905.10308
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SCRAM: Spatially Coherent Randomized Attention Maps

Abstract: Attention mechanisms and non-local mean operations in general are key ingredients in many state-of-the-art deep learning techniques. In particular, the Transformer model based on multi-head self-attention has recently achieved great success in natural language processing and computer vision. However, the vanilla algorithm computing the Transformer of an image with n pixels has O(n 2 ) complexity, which is often painfully slow and sometimes prohibitively expensive for large-scale image data. In this paper, we p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 34 publications
0
2
0
Order By: Relevance
“…To further extend the attention range, Swin [17] proposed local windows with cycling shifts. Calian et al [4], close in spirit to the proposal of this paper, propose an attention layer derived from PatchMatch to compute the attention, however in their preprint the authors do not verify the validity of their layer in a practical deep learning setting, meaning that there is no way of knowing if the layer functions correctly.…”
Section: Attention Modelsmentioning
confidence: 95%
“…To further extend the attention range, Swin [17] proposed local windows with cycling shifts. Calian et al [4], close in spirit to the proposal of this paper, propose an attention layer derived from PatchMatch to compute the attention, however in their preprint the authors do not verify the validity of their layer in a practical deep learning setting, meaning that there is no way of knowing if the layer functions correctly.…”
Section: Attention Modelsmentioning
confidence: 95%
“…Sukhbaatar et al [24] introduced the idea of a learnable adaptive span for each attention layer. Calian et al [5] proposed a fast randomized algorithm that exploits spatial coherence and sparsity to design sparse approximations. We believe that all these methods can be possibly combined with YLG, but so far nothing has been demonstrated to improve generative models in a plug-and-play way that this work shows.…”
Section: Related Workmentioning
confidence: 99%