2023
DOI: 10.48550/arxiv.2302.02451
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

KDEformer: Accelerating Transformers via Kernel Density Estimation

Abstract: Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, naïve exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of longsequence models. Critical bottlenecks are due to the computation of partition functions in the denominator of softmax function as well as the multiplication of the softmax matrix with the matrix of values. Our key observation is that the former ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…Research conducted by various sources such as [CGRS19, KKL20, WLK + 20, DKOD20, KVPF20, CDW + 21, CDL + 22] has underscored this perspective. Due to that motivation, [ZHDK23,AS23] study the computation of the attention matrix from the hardness perspective and purpose faster algorithms.…”
Section: Algorithmic Regularizationmentioning
confidence: 99%
“…Research conducted by various sources such as [CGRS19, KKL20, WLK + 20, DKOD20, KVPF20, CDW + 21, CDL + 22] has underscored this perspective. Due to that motivation, [ZHDK23,AS23] study the computation of the attention matrix from the hardness perspective and purpose faster algorithms.…”
Section: Algorithmic Regularizationmentioning
confidence: 99%