2020
DOI: 10.48550/arxiv.2007.13802
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
13
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 14 publications
1
13
0
Order By: Relevance
“…All models are trained using the Adam optimizer [23], with a learning rate schedule including an initial linear warm-up phase, a constant phase, and an exponential decay phase [4]. All the baseline models and proposed methods use the same training strategy.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…All models are trained using the Adam optimizer [23], with a learning rate schedule including an initial linear warm-up phase, a constant phase, and an exponential decay phase [4]. All the baseline models and proposed methods use the same training strategy.…”
Section: Methodsmentioning
confidence: 99%
“…Recent application of recurrent neural network transducers (RNN-T) has achieved significant progress in the area of online streaming end-to-end automatic speech recognition (ASR) [1][2][3][4]. However, building an accent-robust system remains a big challenge.…”
Section: Introductionmentioning
confidence: 99%
“…We leave the effect of optimizing interpolation weights for best overall perplexity of OOD data as future work. [17,18]. For shallow fusion with a WFST, we use the lookahead approach described in [19] as it avoids unnecessary arc expansion and provides a heuristic approach to perform subword-level rescoring without the need to build the boosting FST directly at the subword level.…”
Section: N-gram Pruningmentioning
confidence: 99%
“…E2E models are commonly trained to maximize the log posteriors of token sequences given speech sequences while the ASR performance is measured by the word error rate (WER). Therefore, a minimum WER (MWER) criterion was proposed to train CTC [14], AED [15], RNN-T [16,17] and hybrid autoregressive transducer (HAT) [18] models, leading to improved ASR performance.…”
Section: Introductionmentioning
confidence: 99%