2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4960579
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised acoustic and language model training with small amounts of labelled data

Abstract: We measure the effects of a weak language model, estimated from as little as 100k words of text, on unsupervised acoustic model training and then explore the best method of using word confidences to estimate n-gram counts for unsupervised language model training. Even with 100k words of text and 10 hours of training data, unsupervised acoustic modeling is robust, with 50% of the gain recovered when compared to supervised training. For language model training, multiplying the word confidences together to get a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
34
0
2

Year Published

2013
2013
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(36 citation statements)
references
References 6 publications
0
34
0
2
Order By: Relevance
“…For more than a decade, the term unsupervised training in acoustic modeling has referred to using lightly supervised models to generate noisy transcriptions for unannotated speech, which are fed back for subsequent retraining [11,12]. In the past five years, however, truly unsupervised subword acoustic model training has been attempted using various bottom-up strategies, including Gaussian mixture-based universal background models [7], successive state splitting algorithms for hidden Markov models (HMM) [13], traditional estimation of subword HMMs [14], discriminative clustering objectives [15], and non-parametric Bayesian estimation of HMMs [16].…”
Section: Related Workmentioning
confidence: 99%
“…For more than a decade, the term unsupervised training in acoustic modeling has referred to using lightly supervised models to generate noisy transcriptions for unannotated speech, which are fed back for subsequent retraining [11,12]. In the past five years, however, truly unsupervised subword acoustic model training has been attempted using various bottom-up strategies, including Gaussian mixture-based universal background models [7], successive state splitting algorithms for hidden Markov models (HMM) [13], traditional estimation of subword HMMs [14], discriminative clustering objectives [15], and non-parametric Bayesian estimation of HMMs [16].…”
Section: Related Workmentioning
confidence: 99%
“…One typical set of examples are the self-training methods [12,13,14,15]. In this method the transcribed data is used to construct a seed model, with which to decode the untranscribed data at the second stage.…”
Section: Introductionmentioning
confidence: 99%
“…Initial Unlabeled Data These self-training methods have been studied in GMM-based acoustic models [76,77,48,146,105,153]. In recent studies [136,45,60,82], self-training methods are also used in DNN-based acoustic model training.…”
Section: Initial Labeled Datamentioning
confidence: 99%
“…Following the conventional self training [76,77,48,146,105,153] approach, we first train an initial DNN-HMM system using the training data, and decode on the test data. For data selection, we pick utterances with the highest average per-frame decoding likelihood and add them to the training data.…”
Section: Comparison To Self Trainingmentioning
confidence: 99%