2023
DOI: 10.3389/fams.2023.1260187
|View full text |Cite
|
Sign up to set email alerts
|

Real block-circulant matrices and DCT-DST algorithm for transformer neural network

Euis Asriani,
Intan Muchtadi-Alamsyah,
Ayu Purwarianti

Abstract: In the encoding and decoding process of transformer neural networks, a weight matrix-vector multiplication occurs in each multihead attention and feed forward sublayer. Assigning the appropriate weight matrix and algorithm can improve transformer performance, especially for machine translation tasks. In this study, we investigate the use of the real block-circulant matrices and an alternative to the commonly used fast Fourier transform (FFT) algorithm, namely, the discrete cosine transform–discrete sine transf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 32 publications
1
2
0
Order By: Relevance
“…This enhanced efficiency can be credited to utilizing the block g-circulant matrix, a structured matrix falling within the category of low displacement rank (LDR) matrices [12]. This discovery corroborates earlier research detailed in [26,28,29], which similarly highlighted the benefits of both block circulant matrices and circulant matrices through experimental evidence-the block g-circulant matrices allowing us to leverage the concept of data-sparsity. Data sparsity implies that representing an n × n matrix necessitates fewer than O(n 2 ) parameters.…”
Section: Resultssupporting
confidence: 80%
See 2 more Smart Citations
“…This enhanced efficiency can be credited to utilizing the block g-circulant matrix, a structured matrix falling within the category of low displacement rank (LDR) matrices [12]. This discovery corroborates earlier research detailed in [26,28,29], which similarly highlighted the benefits of both block circulant matrices and circulant matrices through experimental evidence-the block g-circulant matrices allowing us to leverage the concept of data-sparsity. Data sparsity implies that representing an n × n matrix necessitates fewer than O(n 2 ) parameters.…”
Section: Resultssupporting
confidence: 80%
“…The size of the weight matrices is the combinations of n and m values such that a block g-circulant matrix of dimension 128 was obtained. Choosing a 128-dimensional matrix corresponds to the findings derived from [26]. We also experimented with other matrix dimensions for comparative purposes as part of our analysis.…”
Section: Data and Experimental Detailsmentioning
confidence: 99%
See 1 more Smart Citation