2011
DOI: 10.1016/j.csl.2010.07.005
|View full text |Cite
|
Sign up to set email alerts
|

The efficient incorporation of MLP features into automatic speech recognition systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 23 publications
(18 citation statements)
references
References 31 publications
0
18
0
Order By: Relevance
“…The KWS results are produced for six different ASR systems: (1) GMM, the baseline GMM/HMM system which is a discriminatively trained, speaker-adaptively trained acoustic model; (2) BSRS, a Bootstrap and restructuring model [20] in which the original training data is randomly re-sampled to produce multiple subsets and the resulting models are aggregated at the state level to produce a large, composite model; (3) CU-HTK, a TANDEM HMM system from Cambridge University using cross-word, stateclustered, triphone models trained with MPE, fMPE, and speakeradaptive training. For efficiency, the MLP features were incorporated in the same fashion as [21]; (4) MLP, a multi-layer perceptron model [22] which is a GMM-based ASR system that uses neuralnetwork features; (5) NN-GMM, a speaker-adaptively and discriminatively trained GMM/HMM system from RWTH Aachen University using bottle-neck neural network features [23] and a 4-gram Kneser-Ney LM with optimized discounting parameters [24] using a modified version of the RWTH open source decoder [25]; and (6) DBN, a deep belief network hybrid model [26,27] with discriminative pertraining, frame-level cross-entropy training and state-level minimum Bayes risk sequence training. GMM, BSRS, DBN and MLP models are built with the IBM Attila toolkit [28].…”
Section: Methodsmentioning
confidence: 99%
“…The KWS results are produced for six different ASR systems: (1) GMM, the baseline GMM/HMM system which is a discriminatively trained, speaker-adaptively trained acoustic model; (2) BSRS, a Bootstrap and restructuring model [20] in which the original training data is randomly re-sampled to produce multiple subsets and the resulting models are aggregated at the state level to produce a large, composite model; (3) CU-HTK, a TANDEM HMM system from Cambridge University using cross-word, stateclustered, triphone models trained with MPE, fMPE, and speakeradaptive training. For efficiency, the MLP features were incorporated in the same fashion as [21]; (4) MLP, a multi-layer perceptron model [22] which is a GMM-based ASR system that uses neuralnetwork features; (5) NN-GMM, a speaker-adaptively and discriminatively trained GMM/HMM system from RWTH Aachen University using bottle-neck neural network features [23] and a 4-gram Kneser-Ney LM with optimized discounting parameters [24] using a modified version of the RWTH open source decoder [25]; and (6) DBN, a deep belief network hybrid model [26,27] with discriminative pertraining, frame-level cross-entropy training and state-level minimum Bayes risk sequence training. GMM, BSRS, DBN and MLP models are built with the IBM Attila toolkit [28].…”
Section: Methodsmentioning
confidence: 99%
“…The MLP features are computed using a network that takes 9 frames of static, delta, delta-delta, and triple-delta PLP features as input, contains two hidden layers of 2000 logistic units each, a 26-unit bottleneck layer, and a softmax output layer with 39 monophone targets. For efficiency the MLP features are incorporated in the same fashion as [18]. Supervision for both global CMLLR and subsequent global MLLR adaptation is based on the initial SI decoding.…”
Section: Cued Systemmentioning
confidence: 99%
“…The GMM-HMM acoustic models (AMs) were trained using the procedure described in [24]. Unilingual and multilingual AMs were each built from a flat start.…”
Section: Gmm-hmms Trainingmentioning
confidence: 99%