2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2012
DOI: 10.1109/icassp.2012.6288827
|View full text |Cite
|
Sign up to set email alerts
|

Distributed acoustic modeling with back-off n-grams

Abstract: The paper proposes an approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model) to approximately 100 times larger than current sizes used in ASR.Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The new acoustic model is estimated and stored using the MapReduce distrib… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 6 publications
0
6
0
Order By: Relevance
“…In [12], Ignacioet al used Deep Neural Networks (DNN), a state-of-art machine learning technique, to identify the language spoken by smartphone users. There are also a number of theoretical papers focusing on improve the computation e ciency of the training process of speech recognition [13][14][15]. These works are quite different from ours, since AccelWord relies on accelerometer to monitor voice signals and targets only on hotword detection.…”
Section: Related Workmentioning
confidence: 98%
See 1 more Smart Citation
“…In [12], Ignacioet al used Deep Neural Networks (DNN), a state-of-art machine learning technique, to identify the language spoken by smartphone users. There are also a number of theoretical papers focusing on improve the computation e ciency of the training process of speech recognition [13][14][15]. These works are quite different from ours, since AccelWord relies on accelerometer to monitor voice signals and targets only on hotword detection.…”
Section: Related Workmentioning
confidence: 98%
“…Note that the other dominant factor in slow adaptation of voice control application is inaccuracy in speech recognition after the hotword is detected. However, since there is plethora of research [11][12][13][14][15] already done on this topic, we do not consider complete speech recognition in this work and simply focus on the hotword detection.…”
Section: Design Goals and Challengesmentioning
confidence: 99%
“…Discriminative language models (DLMs) [14] aim at directly optimizing word error rate by rewarding features that appear in low error hypotheses and penalizing features in misrecognized hypotheses. Since the estimation of discriminative LMs is computationally more intensive than regular n-gram LM one has to use distributed learning algorithms and supporting parallel computing infrastructure [16].…”
Section: Natural Language Predictionmentioning
confidence: 99%
“…Performances have been measured in terms of Precision at 1 (P@1), that is the percentage of correctly transcribed sentences occupying the first position in the rank, and Word Error Rate (or WER). All audio files are analyzed through the official Google ASR APIs [21]. In order to reduce the evaluation bias to ASR errors, only those commands with an available solution within the 5 input candidates were retained for the experiments.…”
Section: Experimental Evaluationsmentioning
confidence: 99%