2022
DOI: 10.48550/arxiv.2206.10552
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Vicinity Vision Transformer

Abstract: Vision transformers have shown great success on numerous computer vision tasks. However, its central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational complexity and memory footprint being quadratic. Although linear attention was introduced in natural language processing (NLP) tasks to mitigate a similar issue, directly applying existing linear attention to vision transformers may not lead to satisfactory results. We investigate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 31 publications
0
2
0
Order By: Relevance
“…To reduce the complexity close to linear O(N ), linear Transformers [37,63,15,14,64,87,72,71] decompose the similarity function δ(•) to a kernel function ρ(•), where δ(QK…”
Section: Linear Attentionmentioning
confidence: 99%
“…To reduce the complexity close to linear O(N ), linear Transformers [37,63,15,14,64,87,72,71] decompose the similarity function δ(•) to a kernel function ρ(•), where δ(QK…”
Section: Linear Attentionmentioning
confidence: 99%
“…Efficient Transformers [17,20,25,31] have achieved remarkable advances in recent years. They reduce the quadratic computational complexity of the standard Transformer [35] by spasifying or approximating Softmax attention in a more efficient fashion.…”
Section: Introductionmentioning
confidence: 99%