Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2022
DOI: 10.1145/3503222.3507738
|View full text |Cite
|
Sign up to set email alerts
|

DOTA: detect and omit weak attentions for scalable transformer acceleration

Abstract: Transformer Neural Networks have demonstrated leading performance in many applications spanning over language understanding, image processing, and generative modeling. Despite the impressive performance, long-sequence Transformer processing is expensive due to quadratic computation complexity and memory consumption of self-attention. In this paper, we present DOTA, an algorithmarchitecture co-design that effectively addresses the challenges of scalable Transformer inference. Based on the insight that not all c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(20 citation statements)
references
References 37 publications
0
20
0
Order By: Relevance
“…Our ViTCoD split and conquer algorithm exhibits a great potential in both reducing the dominate attention computations and alleviating the irregularity of the resulting sparse attention masks. However, this potential cannot be fully exploited by existing Transformer accelerators [21], [27], [39] due to the fact that (1) they are designed for dynamic sparse attention which requires both on-the-fly mask generation and highly reconfigurable architecture supports, both of which require nontrivial overheads, and (2) they are not dedicated for processing the enforced two distinct workloads, i.e., denser and sparser patterns, from our ViTCoD algorithm. As such, our ViTCoD accelerator is motivated to exploit the new opportunities i.e., fixed and structurally sparse patterns, resulting from ViTCoD algorithm to boost ViTs' inference efficiency.…”
Section: A Motivation Of Vitcod Acceleratormentioning
confidence: 99%
See 3 more Smart Citations
“…Our ViTCoD split and conquer algorithm exhibits a great potential in both reducing the dominate attention computations and alleviating the irregularity of the resulting sparse attention masks. However, this potential cannot be fully exploited by existing Transformer accelerators [21], [27], [39] due to the fact that (1) they are designed for dynamic sparse attention which requires both on-the-fly mask generation and highly reconfigurable architecture supports, both of which require nontrivial overheads, and (2) they are not dedicated for processing the enforced two distinct workloads, i.e., denser and sparser patterns, from our ViTCoD algorithm. As such, our ViTCoD accelerator is motivated to exploit the new opportunities i.e., fixed and structurally sparse patterns, resulting from ViTCoD algorithm to boost ViTs' inference efficiency.…”
Section: A Motivation Of Vitcod Acceleratormentioning
confidence: 99%
“…Baselines: To benchmark ViTCoD with SOTA attention accelerators, we consider a total of five baselines, including three general platforms: CPU (Intel Xeon Gold 6230R), EdgeGPU (Nvidia Jetson Xavier NX), and GPU (Nvidia 2080Ti), and two attention accelerators: SpAtten [39] and Sanger [21]. Note that when benchmarking with GPUs w/ larger batch size, we scale up the accelerators' hardware resource to have a comparable peak throughput for a fair comparison following [27]. Metrics: We 0.9 0.9 1.9 evaluate all platforms in terms of latency speedups and energy efficiency.…”
Section: Experiments a Experiments Settingmentioning
confidence: 99%
See 2 more Smart Citations
“…Then, the sparse attention matrix with reduced entries goes through the softmax operation after which it is multiplied by a dense value matrix. Many works in algorithm [2], [10], [12], [13], [48], [49] and hardware [21], [22], [28], [33], [40] have been proposed to implement such sparse attentions for NLPbased Transformer models by efficiently tackling various static and dynamic sparse patterns.…”
Section: Introductionmentioning
confidence: 99%