2022
DOI: 10.48550/arxiv.2202.10447
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformer Quality in Linear Time

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
14
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(14 citation statements)
references
References 0 publications
0
14
0
Order By: Relevance
“…They combine the strengths of CNNs (efficient training), RNNs (efficient inference), and continuous models (robust to change in sampling rates). LambdaNetworks [2], AFT [90] and FLASH [40] are other attempts at replacing attention in the context of image classification and language modeling.…”
Section: A Related Workmentioning
confidence: 99%
“…They combine the strengths of CNNs (efficient training), RNNs (efficient inference), and continuous models (robust to change in sampling rates). LambdaNetworks [2], AFT [90] and FLASH [40] are other attempts at replacing attention in the context of image classification and language modeling.…”
Section: A Related Workmentioning
confidence: 99%
“…In this investigation, we consider that both global and local interactions are important for being parameter efficient and examine how to improve computing efficiency. We propose that a novel combination of ResNet [20] and GAU [21] will achieve the best of both worlds-a deep residual convolutional framework can capture the relative-offset-based local correlations of audio sequences progressively via a local receptive field layer-by-layer whilst a simpler, yet more performant, layer than the other transformer architectures, is used to learn content-based global interactions. The remainder of this paper is as follows: Section 2 points out the difficulties in using speech recognition techniques in this field, and also expounds the efforts of other researchers in the application of the techniques for ATC.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, Google proposed a new model architecture to address the quality and empirical speed issues of existing Transformer variants. This is achieved by combining the Attention layer and FFN into a single unit called GAU while reducing it to just one head (Hua et al, 2022). However, it is flawed in many details such as the scaling factor and the replacement of softmax.…”
Section: Introductionmentioning
confidence: 99%
“…
In February this year Google proposed a new Transformer variant called FLASH (Hua et al, 2022), which has a faster speed, lower VRAM footprint and better performance. This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN.
…”
mentioning
confidence: 99%
See 1 more Smart Citation