2021 IEEE Spoken Language Technology Workshop (SLT) 2021
DOI: 10.1109/slt48900.2021.9383581
|View full text |Cite
|
Sign up to set email alerts
|

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition

Abstract: Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules -position-wise feedforward layers and self-attention (SAN) layers. In this paper, to reduce the model complexity while maintaining good performance, we propose a simplified self-atte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 28 publications
(15 citation statements)
references
References 34 publications
0
15
0
Order By: Relevance
“…The architecture was trained to perform both speech recognition and speech translation. Works such as those in [19][20][21] have investigated and proposed methods to improve the attention mechanism of the transformer for ASR. In [22], the authors conducted series of experiments to on RNNs against transformers for a number of ASR tasks and reported superior results with transformers.…”
Section: Existing Work On Transformers For Asrmentioning
confidence: 99%
“…The architecture was trained to perform both speech recognition and speech translation. Works such as those in [19][20][21] have investigated and proposed methods to improve the attention mechanism of the transformer for ASR. In [22], the authors conducted series of experiments to on RNNs against transformers for a number of ASR tasks and reported superior results with transformers.…”
Section: Existing Work On Transformers For Asrmentioning
confidence: 99%
“…Where all H, M , and W vectors have a size of 1024 (≈ denotes downstream operations for key/val). The above idea was inspired by another simplified attention implementation where they replaced the fully connected operations for Q, K and V with an elementwise multiplication with trainable parameters summed over time [8]. As can be seen in…”
Section: Alternate-attention Head Experimentsmentioning
confidence: 99%
“…Artificial intelligence and Machine learning in particular is a hot research subarea because of their numerous applications in various fields and contexts. Examples of applications include, but are not limited to: Natural Language Processing [1,2,3,4,5,6], Computer Vision [7,8,9,10], Game theory [11,12], Speech Recognition [13], Security [14,15,16,17,18,19,20,21,22,23,24], Medical diagnosis [25,26], Statistical Arbitrage [27], Network Anomaly Detection [28,29,30,31,32], Learning associations [33,34], Prediction [35,36,37,38,39], Extraction of information [40,41,42,43], Biometrics [44,45,46], Regression [47], Financial Services [48,…”
Section: Introductionmentioning
confidence: 99%