2020
DOI: 10.48550/arxiv.2006.01712
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…A far-field set of about 15 hours data and a common set of about 30 hours data are used to evaluated the performance. Other configurations could be found in [21,28,29]. Real time factor (RTF) was used to measure the inference speed on GPU (NVIDIA Tesla V100).…”
Section: Methodsmentioning
confidence: 99%
“…A far-field set of about 15 hours data and a common set of about 30 hours data are used to evaluated the performance. Other configurations could be found in [21,28,29]. Real time factor (RTF) was used to measure the inference speed on GPU (NVIDIA Tesla V100).…”
Section: Methodsmentioning
confidence: 99%
“…The recent years have seen the rapid development of endto-end (E2E) models applied in automatic speech recognition (ASR) field, including connection temporal classification (CTC) [8], recurrent neural network transducer (RNN-T) [9,10] and attention based encoder decoder (AED) [11,12,13], etc. Benefiting from the strong ability of multi-headed-attention (MHA) in global context modeling, transformer [11] is competitive among these models in non-streaming tasks, hence kinds of variants come out [14,15,16] and become the mainstream of ASR research.…”
Section: Introductionmentioning
confidence: 99%
“…The encoder transforms raw acoustic features into a high-level representation, while the decoder predicts output symbols in an auto-regressive manner. The Transformer-based models have dominated the seq2seq modeling in the ASR field, due to its superiority of recognition accuracy [7][8][9][10][11][12][13][14]. However, large amount model parameters become the main challenge for deploying these ASR models in resource constrained scenarios, where both memory and computation resources are limited.…”
Section: Introductionmentioning
confidence: 99%