2019
DOI: 10.1109/jetcas.2019.2912353
|View full text |Cite
|
Sign up to set email alerts
|

Realizing Petabyte Scale Acoustic Modeling

Abstract: Large scale machine learning (ML) systems such as the Alexa automatic speech recognition (ASR) system continue to improve with increasing amounts of manually transcribed training data. Instead of scaling manual transcription to impractical levels, we utilize semi-supervised learning (SSL) to learn acoustic models (AM) from the vast firehose of untranscribed audio data. Learning an AM from 1 Million hours of audio presents unique ML and system design challenges. We present the design and evaluation of a highly … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 39 publications
0
9
0
Order By: Relevance
“…(i) Reducing time and cost to train DNN models: We believe that communication times will bottleneck training times of distributed systems and this will become even more severe with recent significant improvements in the computational capability of deep learning training hardware. To address this bottleneck, in the past few years, compression techniques have been eagerly researched and implemented in some practical training systems [43]. Meanwhile, we would like to point out, although our compression scheme guarantees theoretical convergence and shows no accuracy loss compared to baseline training over the tested models and applications, there could still be concerns about the impact of lossy gradient compression on neural network convergence performance.…”
Section: Broader Impactmentioning
confidence: 99%
See 1 more Smart Citation
“…(i) Reducing time and cost to train DNN models: We believe that communication times will bottleneck training times of distributed systems and this will become even more severe with recent significant improvements in the computational capability of deep learning training hardware. To address this bottleneck, in the past few years, compression techniques have been eagerly researched and implemented in some practical training systems [43]. Meanwhile, we would like to point out, although our compression scheme guarantees theoretical convergence and shows no accuracy loss compared to baseline training over the tested models and applications, there could still be concerns about the impact of lossy gradient compression on neural network convergence performance.…”
Section: Broader Impactmentioning
confidence: 99%
“…Our research results on compression in large-scale distributed training have two broad benefits:(i) Reducing time and cost to train DNN models: We believe that communication times will bottleneck training times of distributed systems and this will become even more severe with recent significant improvements in the computational capability of deep learning training hardware. To address this bottleneck, in the past few years, compression techniques have been eagerly researched and implemented in some practical training systems[43]. Our research results on scalability of gradient compression aim to push this to larger scale distributed training systems, which is needed for the training of expensive and powerful gigantic models.…”
mentioning
confidence: 99%
“…Representative work includes [5][6] form Microsoft, [4] from Amazon, [23] from Baidu. The global model updates can be conducted either through gradient aggregation [5][23] or model averaging [6].…”
Section: B Centralized Distributed Trainingmentioning
confidence: 99%
“…Recently, student-teacher distillation techniques for hybrid HMM-LSTM models have been shown to scale to very large data sets (1 million hours) for models with high capacity [28,27]. The efficacy of model compression using student-teacher distillation is well established [23,35,34].…”
Section: Introductionmentioning
confidence: 99%