ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053573
|View full text |Cite
|
Sign up to set email alerts
|

The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment

Abstract: We present a complete training pipeline to build a state-ofthe-art hybrid HMM-based ASR system on the 2nd release of the TED-LIUM corpus. Data augmentation using SpecAugment is successfully applied to improve performance on top of our best SAT model using i-vectors. By investigating the effect of different maskings, we achieve improvements from SpecAugment on hybrid HMM models without increasing model size and training time. A subsequent sMBR training is applied to fine-tune the final acoustic model, and both … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 42 publications
(27 citation statements)
references
References 20 publications
1
26
0
Order By: Relevance
“…E2E-Transformer (ID 9). We could not achieve any improvements on the DNN-HMM model (ID 2) using the SpecAugment despite trying different hyper-parameter tuning recommendations (Zhou et al, 2020). Overall, the best CER and WER results are achieved by the E2E-Transformer (ID 10) followed by the E2E-RNN (ID 6) and then the DNN-HMM (ID 1).…”
Section: Resultsmentioning
confidence: 98%
“…E2E-Transformer (ID 9). We could not achieve any improvements on the DNN-HMM model (ID 2) using the SpecAugment despite trying different hyper-parameter tuning recommendations (Zhou et al, 2020). Overall, the best CER and WER results are achieved by the E2E-Transformer (ID 10) followed by the E2E-RNN (ID 6) and then the DNN-HMM (ID 1).…”
Section: Resultsmentioning
confidence: 98%
“…We also tried Specaugment, based on random on-the-fly time and feature masking used widely in seq2seq training [21]. All models are trained with the same masking parameters, following indications from [22].…”
Section: Training and Decodingmentioning
confidence: 99%
“…For SWBD, the Hub5'00 and Hub5'01 datasets are used as dev and test set, respectively. All the LMs used are the same as in [4] for TLv2 and [28] (sentence-wise) for SWBD. By default, word error rate (WER) results are obtained with full-sum decoding and a 4gram LM.…”
Section: Experiments 31 Setupmentioning
confidence: 99%