2012
DOI: 10.1109/tasl.2012.2199982
|View full text |Cite
|
Sign up to set email alerts
|

Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 35 publications
0
10
0
Order By: Relevance
“…We consider a query as OOV if it contains at least one OOV term. The KWS results are produced for six different ASR systems: (1) GMM, the baseline GMM/HMM system which is a discriminatively trained, speaker-adaptively trained acoustic model; (2) BSRS, a Bootstrap and restructuring model [20] in which the original training data is randomly re-sampled to produce multiple subsets and the resulting models are aggregated at the state level to produce a large, composite model; (3) CU-HTK, a TANDEM HMM system from Cambridge University using cross-word, stateclustered, triphone models trained with MPE, fMPE, and speakeradaptive training. For efficiency, the MLP features were incorporated in the same fashion as [21]; (4) MLP, a multi-layer perceptron model [22] which is a GMM-based ASR system that uses neuralnetwork features; (5) NN-GMM, a speaker-adaptively and discriminatively trained GMM/HMM system from RWTH Aachen University using bottle-neck neural network features [23] and a 4-gram Kneser-Ney LM with optimized discounting parameters [24] using a modified version of the RWTH open source decoder [25]; and (6) DBN, a deep belief network hybrid model [26,27] with discriminative pertraining, frame-level cross-entropy training and state-level minimum Bayes risk sequence training.…”
Section: Methodsmentioning
confidence: 99%
“…We consider a query as OOV if it contains at least one OOV term. The KWS results are produced for six different ASR systems: (1) GMM, the baseline GMM/HMM system which is a discriminatively trained, speaker-adaptively trained acoustic model; (2) BSRS, a Bootstrap and restructuring model [20] in which the original training data is randomly re-sampled to produce multiple subsets and the resulting models are aggregated at the state level to produce a large, composite model; (3) CU-HTK, a TANDEM HMM system from Cambridge University using cross-word, stateclustered, triphone models trained with MPE, fMPE, and speakeradaptive training. For efficiency, the MLP features were incorporated in the same fashion as [21]; (4) MLP, a multi-layer perceptron model [22] which is a GMM-based ASR system that uses neuralnetwork features; (5) NN-GMM, a speaker-adaptively and discriminatively trained GMM/HMM system from RWTH Aachen University using bottle-neck neural network features [23] and a 4-gram Kneser-Ney LM with optimized discounting parameters [24] using a modified version of the RWTH open source decoder [25]; and (6) DBN, a deep belief network hybrid model [26,27] with discriminative pertraining, frame-level cross-entropy training and state-level minimum Bayes risk sequence training.…”
Section: Methodsmentioning
confidence: 99%
“…8,16,20, and 24, i.e., the ensemble performance increased with the mixture size. This implies that the ensemble model is able to take advantage of the reduced bias in the overfit base models by reducing the variance that accompanies the overfit.…”
Section: Bias and Variancementioning
confidence: 96%
“…In such circumstance authors tested a state-mapping approach [20], where the main task is to find the state similarities of average voice-language-dependent acoustic models [17]. Also, frame mapping and Gaussian component mapping have been explored in [21,22] and [23], respectively.…”
Section: Related Workmentioning
confidence: 99%