2020
DOI: 10.48550/arxiv.2001.08290
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…To apply MPC in streaming models, the Transformer encoder needs to be restricted to only use information that has appeared before. Though some previous work [26,27] employed chunkwise splitting for streaming models, in this paper, we simply changed self-attention mask on Transformer encoder to make the whole model stream-able. Specifically, we use a triangular matrix for self-attention mask M in encoder, where the upper triangular part is set to −∞, and the other elements to 0.…”
Section: Mpc For Streaming Modelsmentioning
confidence: 99%
“…To apply MPC in streaming models, the Transformer encoder needs to be restricted to only use information that has appeared before. Though some previous work [26,27] employed chunkwise splitting for streaming models, in this paper, we simply changed self-attention mask on Transformer encoder to make the whole model stream-able. Specifically, we use a triangular matrix for self-attention mask M in encoder, where the upper triangular part is set to −∞, and the other elements to 0.…”
Section: Mpc For Streaming Modelsmentioning
confidence: 99%
“…In (18), u i,k is the pre-softmax activations. In (19), w denotes the chunk width and α i,k denotes the attention weight within the chunk.…”
Section: B Monotonic Chunk-wise Attention (Mocha)mentioning
confidence: 99%
“…Although the hybrid CTC/attention end-to-end ASR architecture is reaching reasonable performance [16]- [19], how to deploy it in online scenarios remains an unsolved problem. After inspections of the CTC/attention ASR architecture, we identify four challenges to deploy online hybrid CTC/attention end-to-end ASR systems:…”
Section: Introductionmentioning
confidence: 99%
“…As for language modelling, Transformerbased architectures have achieved very promising results [9,10], though Long-Short Term Memory (LSTM) recurrent neural networks (RNN) are still in broad use [11]. Apart from hybrid systems, end-to-end systems have received great attention in recent years, including a number of proposals for low-latency streaming decoding [12][13][14]. However, despite their simplicity and promising prospects, it is still unclear whether or not they will soon surpass state-of-the-art hybrid systems combining independent models trained from vast amounts of data.…”
Section: Introductionmentioning
confidence: 99%