MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture 2021
DOI: 10.1145/3466752.3480125
|View full text |Cite
|
Sign up to set email alerts
|

Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
62
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 80 publications
(62 citation statements)
references
References 53 publications
0
62
0
Order By: Relevance
“…Recent works look into co-designing for sparse architectures. Sanger prunes the attention matrix for its reconfigurable architecture to exploit [36]. ESCALATE utilized kernel decomposition to accelerate CNN models [33].…”
Section: Related Workmentioning
confidence: 99%
“…Recent works look into co-designing for sparse architectures. Sanger prunes the attention matrix for its reconfigurable architecture to exploit [36]. ESCALATE utilized kernel decomposition to accelerate CNN models [33].…”
Section: Related Workmentioning
confidence: 99%
“…Hardware-algorithm co-design for attention models. Several algorithmic optimizations co-designed with hardware acceleration were proposed for efficient execution of attention models [34,35,60,64,89,92,96,108]. 𝐴 3 has proposed an approximation method with a hardware accelerator to prune out the ineffectual computations in attention.…”
Section: Related Workmentioning
confidence: 99%
“…EdgeBERT [92] leverages entropy-based early exiting technique to predict the minimal number of transformer layers that need to be executed, while the rest can be skipped. Other works aim to address the computational cost of self-attention via sparse matrix operation [13,60,64], quantization [108], and Softmax approximation [89]. Moreover, none of these prior designs explored bit-level early compute termination.…”
Section: Related Workmentioning
confidence: 99%
“…CSP avoids sparsity skipping logic and instead incorporates an early stop mechanism based on the induced sparsity pattern. Sanger [24] is another 2-way sparse approach that targets the dynamic structures of attentionbased models (i.e., Logit and Attend operators); it dynamically applies fine-grained structure pruning with a dataflow that is well suited for Logit and Attend operators. CSP-A is not a dynamic pruning method and instead targets the static elements of the attention layers, thus treating the Logit and Attend operators as dense.…”
Section: Related Workmentioning
confidence: 99%