ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413493
|View full text |Cite
|
Sign up to set email alerts
|

Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition

Abstract: Self-attention (SA) based models have recently achieved significant performance improvements in hybrid and end-to-end automatic speech recognition (ASR) systems owing to their flexible context modeling capability. However, it is also known that the accuracy degrades when applying SA to long sequence data. This is mainly due to the length mismatch between the inference and training data because the training data are usually divided into short segments for efficient training. To mitigate this mismatch, we propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…Although SAM is very beneficial, its complexity escalates quadratically with the length of the input sequence, severely hindering its exploration on longer sequences. Some proposals, such as Nyström matrix decomposition [11], kernel methods [12,13], hashing strategies [14], sparsification [15][16][17], random projections [18], etc., have good effect in diminishing the complexity. In further, Ref.…”
Section: Introductionmentioning
confidence: 99%
“…Although SAM is very beneficial, its complexity escalates quadratically with the length of the input sequence, severely hindering its exploration on longer sequences. Some proposals, such as Nyström matrix decomposition [11], kernel methods [12,13], hashing strategies [14], sparsification [15][16][17], random projections [18], etc., have good effect in diminishing the complexity. In further, Ref.…”
Section: Introductionmentioning
confidence: 99%
“…Positional encoding is an important part of the Transformer [14] and the bottom input component of the encoder and decoder stack. Positional encoding has been applied in many tasks and has played an irreplaceable role [15,16,17,18,19,20,21]. At present, the task of document layout analysis pays little attention to solving the continuity of the classification document prediction area and maintaining the detailed information of the image, especially the use of positional encoding for document layout analysis.…”
Section: Introductionmentioning
confidence: 99%