Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-415
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-End Speech Recognition

Abstract: End-to-end models are favored in automatic speech recognition (ASR) because of their simplified system structure and superior performance. Among these models, Transformer and Conformer have achieved state-of-the-art recognition accuracy in which self-attention plays a vital role in capturing important global information. However, the time and memory complexity of self-attention increases squarely with the length of the sentence. In this paper, a prob-sparse self-attention mechanism is introduced into Conformer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…The range of top-u approximately holds as proved in [7]. Under the long tail distribution, it has been empirically testified that randomly sample U = LlogL dot-product pairs to compute the query sparsity measure and then select sparse top-u from it and form Q yielded acceptable results [9].…”
Section: Sparser Attentionmentioning
confidence: 94%
See 2 more Smart Citations
“…The range of top-u approximately holds as proved in [7]. Under the long tail distribution, it has been empirically testified that randomly sample U = LlogL dot-product pairs to compute the query sparsity measure and then select sparse top-u from it and form Q yielded acceptable results [9].…”
Section: Sparser Attentionmentioning
confidence: 94%
“…There are a list of work that proposed new attention mechanisms to replace the O(L 2 ) time/space complexities into O(LlogL) or even O(L) [8,9]. Motivated by [7] for modeling long sequences for time-series forecasting, we adapt the ProbSparse self-attention mechanism to replace the MHSA function in Equation 2.…”
Section: Sparser Attentionmentioning
confidence: 99%
See 1 more Smart Citation
“…Architectural modifications for Transformer-based ASR models have been of great interest. Many works focus on reducing the heavy computational cost caused by SA [20,21,22,23]. For example, Efficient Conformer [20] proposed grouped SA and downsampling block to shorten the length of the sequence to be processed.…”
Section: Related Workmentioning
confidence: 99%
“…It uses * Equal contribution a convolution module to capture local context dependencies in addition to the long context captured by the self-attention module. The conformer architecture was investigated for different end-to-end systems such as attention encoder-decoder models [12,13], and recurrent neural network transducer [10,14]. Nevertheless, there has been no work investigating the impact of using a conformer AM for hybrid ASR systems.…”
Section: Introduction and Related Workmentioning
confidence: 99%