2022
DOI: 10.48550/arxiv.2211.05109
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

Abstract: Vision Transformer (ViT) has emerged as a competitive alternative to convolutional neural networks for various computer vision applications. Specifically, ViTs' multi-head attention layers make it possible to embed information globally across the overall image. Nevertheless, computing and storing such attention matrices incurs a quadratic cost dependency on the number of patches, limiting its achievable efficiency and scalability and prohibiting more extensive real-world ViT applications on resource-constraine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 24 publications
0
1
0
Order By: Relevance
“…Several algorithms have been proposed to improve Transformers' efficiency via approximating the softmax matrices in their attention layers with either sparse matrices [19,12,22,26] or low-rank matrices [11,18], or a combination of both [10,32,9,13]. However, all prior advances solely focused on point-wise approximating the entries of the softmax matrix and fail to provide rigorous approximation guarantees on the final output of the attention mechanism.…”
Section: Introductionmentioning
confidence: 99%
“…Several algorithms have been proposed to improve Transformers' efficiency via approximating the softmax matrices in their attention layers with either sparse matrices [19,12,22,26] or low-rank matrices [11,18], or a combination of both [10,32,9,13]. However, all prior advances solely focused on point-wise approximating the entries of the softmax matrix and fail to provide rigorous approximation guarantees on the final output of the attention mechanism.…”
Section: Introductionmentioning
confidence: 99%