Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2018
|View full text |Cite
|
Sign up to set email alerts
|

Online Hybrid CTC/Attention Architecture for End-to-End Speech Recognition

Abstract: The hybrid CTC/attention end-to-end automatic speech recognition (ASR) combines CTC ASR system and attention ASR system into a single neural network. Although the hybrid CTC/attention ASR system takes the advantages of both CTC and attention architectures in training and decoding, it remains challenging to be used for streaming speech recognition for its attention mechanism, CTC prefix probability and bidirectional encoder. In this paper, we propose a stable monotonic chunkwise attention (sMoChA) to stream its… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
48
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 50 publications
(50 citation statements)
references
References 20 publications
0
48
0
2
Order By: Relevance
“…The hyperparameters λ and γ are tunable. For online decoding, we proposed DWJD algorithm [10] to 1) coordinate the forward propagation in the encoder and the beam search in the decoder; 2) address the unsynchronized predictions of the sMoChA-based decoder and CTC outputs. MTA [11], which performs attention on top of the truncated historical encoder outputs, outperforms the sMoChA by exploiting longer history.…”
Section: Online Ctc/attention E2e Architecturementioning
confidence: 99%
“…The hyperparameters λ and γ are tunable. For online decoding, we proposed DWJD algorithm [10] to 1) coordinate the forward propagation in the encoder and the beam search in the decoder; 2) address the unsynchronized predictions of the sMoChA-based decoder and CTC outputs. MTA [11], which performs attention on top of the truncated historical encoder outputs, outperforms the sMoChA by exploiting longer history.…”
Section: Online Ctc/attention E2e Architecturementioning
confidence: 99%
“…tend to zero for a certain output step, the weak attentions would inevitably pass down to all the following steps [24], which results in vanished gradients for backpropagation. To remedy this, sMoChA [22] is proposed to simplify eq. 7to:…”
Section: Transformer For Online Asrmentioning
confidence: 99%
“…MTA [22] further simplifies the workflow of attention computation, in which the production of attention weights and context vector has been unified for both training and inference, using eq. 12and (8) respectively.…”
Section: Transformer For Online Asrmentioning
confidence: 99%
“…To overcome this limitation, online attention mechanisms have been proposed, and can be broadly classed into the following categories: i) Bernoulli-based attention methods formulate the triggering of outputs as a stochas-tic process, and the decision is sampled from the attending probabilities. These include Hard Monotonic Attention (HMA) [11], Monotonic Chunkwise Attention (MoChA) [12,13] and Monotonic Truncated Attention (MTA) [14]; ii) Triggered attention methods are conditioned on the forced alignment produced by CTC [15,16]; iii) Accumulation-based attention methods accumulate the attention weights along encoding timesteps and the computation is halted once the sum reaches a certain threshold. These include Adaptive Computation Steps (ACS) [17,18] and Decoder-end Adaptive Computation Steps (DACS) [19].…”
Section: Introductionmentioning
confidence: 99%