The efficient incorporation of MLP features into automatic speech recognition systems

Park, Joonyong; Diehl, Frank; Gales, Mark J. F.; Tomalin, Marcus; Woodland, Philip C.

doi:10.1016/j.csl.2010.07.005

Cited by 23 publications

(18 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The KWS results are produced for six different ASR systems: (1) GMM, the baseline GMM/HMM system which is a discriminatively trained, speaker-adaptively trained acoustic model; (2) BSRS, a Bootstrap and restructuring model [20] in which the original training data is randomly re-sampled to produce multiple subsets and the resulting models are aggregated at the state level to produce a large, composite model; (3) CU-HTK, a TANDEM HMM system from Cambridge University using cross-word, stateclustered, triphone models trained with MPE, fMPE, and speakeradaptive training. For efficiency, the MLP features were incorporated in the same fashion as [21]; (4) MLP, a multi-layer perceptron model [22] which is a GMM-based ASR system that uses neuralnetwork features; (5) NN-GMM, a speaker-adaptively and discriminatively trained GMM/HMM system from RWTH Aachen University using bottle-neck neural network features [23] and a 4-gram Kneser-Ney LM with optimized discounting parameters [24] using a modified version of the RWTH open source decoder [25]; and (6) DBN, a deep belief network hybrid model [26,27] with discriminative pertraining, frame-level cross-entropy training and state-level minimum Bayes risk sequence training. GMM, BSRS, DBN and MLP models are built with the IBM Attila toolkit [28].…”

Section: Methodsmentioning

confidence: 99%

System combination and score normalization for spoken term detection

Mamou

Cui

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Spoken content in languages of emerging importance needs to be searchable to provide access to the underlying information. In this paper, we investigate the problem of extending data fusion methodologies from Information Retrieval for Spoken Term Detection on low-resource languages in the framework of the IARPA Babel program. We describe a number of alternative methods improving keyword search performance. We apply these methods to Cantonese, a language that presents some new issues in terms of reduced resources and shorter query lengths. First, we show score normalization methodology that improves in average by 20% keyword search performance. Second, we show that properly combining the outputs of diverse ASR systems performs 14% better than the best normalized ASR system.

show abstract

Section: Methodsmentioning

confidence: 99%

System combination and score normalization for spoken term detection

Mamou

Cui

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…The MLP features are computed using a network that takes 9 frames of static, delta, delta-delta, and triple-delta PLP features as input, contains two hidden layers of 2000 logistic units each, a 26-unit bottleneck layer, and a softmax output layer with 39 monophone targets. For efficiency the MLP features are incorporated in the same fashion as [18]. Supervision for both global CMLLR and subsequent global MLLR adaptation is based on the initial SI decoding.…”

Section: Cued Systemmentioning

confidence: 99%

A high-performance Cantonese keyword search system

Kingsbury

Cui

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

We present a system for keyword search on Cantonese conversational telephony audio, collected for the IARPA Babel program, that achieves good performance by combining postings lists produced by diverse speech recognition systems from three different research groups. We describe the keyword search task, the data on which the work was done, four different speech recognition systems, and our approach to system combination for keyword search. We show that the combination of four systems outperforms the best single system by 7%, achieving an actual term-weighted value of 0.517.

show abstract

“…The GMM-HMM acoustic models (AMs) were trained using the procedure described in [24]. Unilingual and multilingual AMs were each built from a flat start.…”

Section: Gmm-hmms Trainingmentioning

confidence: 99%

Investigation of multilingual deep neural networks for spoken term detection

Knill

Gales

Rath

et al. 2013

2013 IEEE Workshop on Automatic Speech Recognition and Understanding

View full text Add to dashboard Cite

The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speechto-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (∼10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.

show abstract

The efficient incorporation of MLP features into automatic speech recognition systems

Cited by 23 publications

References 31 publications

System combination and score normalization for spoken term detection

System combination and score normalization for spoken term detection

A high-performance Cantonese keyword search system

Investigation of multilingual deep neural networks for spoken term detection

Contact Info

Product

Resources

About