2010
DOI: 10.1109/tasl.2010.2052610
|View full text |Cite
|
Sign up to set email alerts
|

Robust Speech Recognition Based on Dereverberation Parameter Optimization Using Acoustic Model Likelihood

Abstract: Automatic speech recognition (ASR) in reverberant environments is a challenging task. Most dereverberation techniques address this problem through signal processing and enhances the reverberant waveform independent from the speech recognizer. In this paper, we propose a novel scheme to perform dereverberation in relation with the likelihood of the back-end ASR system. Our proposed approach effectively selects the dereverberation parameters, in the form of multiband scale factors, so that they improve the likel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
25
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 25 publications
(26 citation statements)
references
References 25 publications
1
25
0
Order By: Relevance
“…There are many successful designs focusing on the optimisation of dereverberation parameters based on feedback obtained from ASR results (Gomez, Kawahara, 2010;Seltzer et al, 2004). …”
Section: Dereverberation Methodsmentioning
confidence: 99%
“…There are many successful designs focusing on the optimisation of dereverberation parameters based on feedback obtained from ASR results (Gomez, Kawahara, 2010;Seltzer et al, 2004). …”
Section: Dereverberation Methodsmentioning
confidence: 99%
“…However, this assumption may not hold true; thus, we show later an optimization process aimed to further strengthen the assumption in the wavelet wavelet-based dereverberation optimized for asr ) and B( f , w). Consequently, the recovered early reflection is processed via Cepstral Mean Normalization (CMN) [6] prior to the ASR. From this point onward, we assume that processing is conducted in framewise manner, dropping the index w.…”
Section: A) Model For Dereverberationmentioning
confidence: 99%
“…Depending on the size of the training dataset, the number of Gaussian mixture components are increased to improve signal discrimination. In our system, a total of 256 Gaussian mixture components are used for each model and the training of the two GMM classes is based on the Expectation-Maximization algorithm [5]. The microphone array-processed data is windowed using a 25ms frame.…”
Section: ) Speaker Diarizationmentioning
confidence: 99%
“…The system is scalable and portable, and employs established and state-of-the-art techniques in speech [5], vision [6], and graphics processing (GPU) for multiple-people multimodal interaction data capture and analysis. It consists of a large display equipped with multiple sensing devices spaced on a portable structure: one microphone array, six HD video cameras, and two depth sensors (see Fig.…”
Section: Introductionmentioning
confidence: 99%