2023
DOI: 10.48550/arxiv.2301.09262
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MEMO : Accelerating Transformers with Memoization on Big Memory Systems

Abstract: Transformers gain popularity because of their superior prediction accuracy and inference throughput. However, the transformer is computation-intensive, causing a long inference time. The existing work to accelerate transformer inferences has limitations because of the changes to transformer architectures or the need for specialized hardware. In this paper, we identify the opportunities of using memoization to accelerate the attention mechanism in transformers without the above limitation. Built upon a unique o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Recent studies have suggested various strategies [13,16,[37][38][39][40] to improve the computational efficiency of self-attention. These strategies include: (1) Diminishing the dimension of input by utilizing low-rank or compressed embeddings [41][42][43]; (2) Implementing local or sparse self-attention mechanisms, such as convolutional layers [41,44] or shifted window [36,45], which can focus on nearby or relevant elements.…”
Section: Self-attention Modulementioning
confidence: 99%
“…Recent studies have suggested various strategies [13,16,[37][38][39][40] to improve the computational efficiency of self-attention. These strategies include: (1) Diminishing the dimension of input by utilizing low-rank or compressed embeddings [41][42][43]; (2) Implementing local or sparse self-attention mechanisms, such as convolutional layers [41,44] or shifted window [36,45], which can focus on nearby or relevant elements.…”
Section: Self-attention Modulementioning
confidence: 99%