2022
DOI: 10.48550/arxiv.2210.05793
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

Abstract: Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data. In this paper, we focus on knowledge distillation for the RNN-T model, which is widely used in state-ofthe-art (SoTA) automatic speech recognition (ASR). Specifically, we compared using soft and hard target distillation to train large-scale RNN-T models on the LibriSpeech/LibriLight public dataset (60k hours) and our in-house data (600k hours).… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 25 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?