2011 Joint Workshop on Hands-Free Speech Communication and Microphone Arrays 2011
DOI: 10.1109/hscma.2011.5942396
|View full text |Cite
|
Sign up to set email alerts
|

Multi-style training of HMMS with stereo data for reverberation-robust speech recognition

Abstract: A novel training algorithm using data pairs of clean and reverberant feature vectors for estimating robust Hidden Markov Models (HMMs), introduced in [1] for matched training, is employed in this paper for multi-style training. The multi-style HMMs are derived from well-trained cleanspeech HMMs by aligning the clean data to the clean-speech HMM and using the resulting state-frame alignment to estimate the Gaussian mixture densities from the reverberant data of several different rooms. Thus, the temporal alignm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 5 publications
0
7
0
Order By: Relevance
“…The increased recognition performance of ICEWIND over conventional Baum-Welch method is shown in Ref. [23].…”
Section: Decoder-based Approachmentioning
confidence: 95%
See 1 more Smart Citation
“…The increased recognition performance of ICEWIND over conventional Baum-Welch method is shown in Ref. [23].…”
Section: Decoder-based Approachmentioning
confidence: 95%
“…In this case, only the RIR of the target room needs to be measured or estimated, and then a set of reverberant training data can be generated by convolving clean speech signals with this RIR. This concept is used in the information combining estimation with non-reverberant data (ICEWIND) Brought to you by | New York University Bobst Library Technical Services Authenticated Download Date | 6/21/15 2:18 PM method [23]. ICEWIND uses ordered pairs [s(k), x(k)] of clean and reverberant feature vector sequences as training data to determine the parameters λ x of a reverberant HMM.…”
Section: Decoder-based Approachmentioning
confidence: 99%
“…This very accurate temporal alignment is then used to estimate the parameters of the emission densities of the reverberant HMM by applying the Expectation Maximization (EM) algorithm to the reverberant data. It has been shown in [5,7] that this approach significantly reduces the computational complexity of the HMM training and improves the recognition rates at the same time, both for room-specific and multi-style training of word-level HMMs for a connected-digit recognition task.…”
Section: Introductionmentioning
confidence: 99%
“…Promising approaches to reduce this mismatch are to train the acoustic models on corrupted data [1,2] and multi-style training [3][4][5][6]. In this paper, we follow these approaches using synthetically reverberated training data.…”
Section: Introductionmentioning
confidence: 99%
“…For clarity, we entirely focus in this article on the compensation rules while ignoring the parameter estimation step. We also disregard approaches that apply a modified training method to conventional HMMs without exhibiting a distinct compensation step, as it is characteristic for, e.g., discriminative [28], multi-condition [29], or reverberant training [30].…”
Section: The Bayesian Viewmentioning
confidence: 99%