2021
DOI: 10.48550/arxiv.2106.06126
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

Abstract: We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error rate reduction (WERR). When increasing the supervised data to seven-fold, our gains diminish to 7.1% WERR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 39 publications
0
1
0
Order By: Relevance
“…Automatic Speech Recognition (ASR) enables fast and accurate transcriptions of voice commands and dictations on edge devices; examples of on-device ASR applications include dictation for Google keyboard [1,2], voice commands for Apple Siri [3], and Amazon Alexa [4], etc. Past work developed streaming End-to-End (E2E) all-neural ASR models that run compactly on edge devices [2,5,6,7].…”
Section: Introductionmentioning
confidence: 99%
“…Automatic Speech Recognition (ASR) enables fast and accurate transcriptions of voice commands and dictations on edge devices; examples of on-device ASR applications include dictation for Google keyboard [1,2], voice commands for Apple Siri [3], and Amazon Alexa [4], etc. Past work developed streaming End-to-End (E2E) all-neural ASR models that run compactly on edge devices [2,5,6,7].…”
Section: Introductionmentioning
confidence: 99%