2002
DOI: 10.1006/csla.2002.0191
|View full text |Cite
|
Sign up to set email alerts
|

Hidden Markov model training with contaminated speech material for distant-talking speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0
2

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 32 publications
(16 citation statements)
references
References 41 publications
0
14
0
2
Order By: Relevance
“…Some data sets were based on simulations realized applying a contamination method [20,21,22] that combines clean-speech signals, estimated Impulse Responses (IRs), and real multichannel background noise sequences, as described in [18]. Other corpora were recorded under real-world conditions.…”
Section: Dirha Corporamentioning
confidence: 99%
See 1 more Smart Citation
“…Some data sets were based on simulations realized applying a contamination method [20,21,22] that combines clean-speech signals, estimated Impulse Responses (IRs), and real multichannel background noise sequences, as described in [18]. Other corpora were recorded under real-world conditions.…”
Section: Dirha Corporamentioning
confidence: 99%
“…The DIRHA-ENGLISH simulated data sets derive from the clean speech described in Section 3.1, and from the application of the contamination method discussed in [20,30].…”
Section: Simulated Data Setmentioning
confidence: 99%
“…For clarity, we entirely focus in this article on the compensation rules while ignoring the parameter estimation step. We also disregard approaches that apply a modified training method to conventional HMMs without exhibiting a distinct compensation step, as it is characteristic for, e.g., discriminative [28], multi-condition [29], or reverberant training [30].…”
Section: The Bayesian Viewmentioning
confidence: 99%
“…to y n . On the other hand, the modified MAP approximation (29) leads to a scaled version of the exact likelihood p(y n |q n ), cf. (27), with the scaling factor p(x MAP n |y n , q n ) being all the higher with increasing accuracy of the approximation (29).…”
Section: Remosmentioning
confidence: 99%
“…environment prevents this form of matched reverberant training from being widely used in real-world applications. Synthetically generating the training data as suggested by [3,4] instead of recording them significantly reduces the effort for matched training. In this case, only the Room Impulse Response (RIR) of the target room needs to be measured or estimated, and then, a set of reverberant training data can be generated by convolving clean-speech signals with this RIR.…”
Section: Introductionmentioning
confidence: 99%