2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947376
|View full text |Cite
|
Sign up to set email alerts
|

Non-negative matrix deconvolution in noise robust speech recognition

Abstract: High noise robustness has been achieved in speech recognition by using sparse exemplar-based methods with spectrogram windows spanning up to 300 ms. A downside is that a large exemplar dictionary is required to cover sufficiently many spectral patterns and their temporal alignments within windows. We propose a recognition system based on a shift-invariant convolutive model, where exemplar activations at all the possible temporal positions jointly reconstruct an utterance. Recognition rates are evaluated using … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(13 citation statements)
references
References 12 publications
0
13
0
Order By: Relevance
“…For noise, it is not guaranteed that similar sound events will be encountered in actual use cases. In our work on AURORA-2, we saw error rates increasing by up to 60% for mismatched noises (Gemmeke et al, 8 2011b;Hurmalainen et al, 2011a). Because a noise mismatch degrades the effectiveness of speech-noise separation, and keeping a generic database for all possible noise types would be infeasible, methods for context-sensitive noise modelling are needed for practical applications.…”
Section: Pre-sampled Exemplar Basesmentioning
confidence: 84%
“…For noise, it is not guaranteed that similar sound events will be encountered in actual use cases. In our work on AURORA-2, we saw error rates increasing by up to 60% for mismatched noises (Gemmeke et al, 8 2011b;Hurmalainen et al, 2011a). Because a noise mismatch degrades the effectiveness of speech-noise separation, and keeping a generic database for all possible noise types would be infeasible, methods for context-sensitive noise modelling are needed for practical applications.…”
Section: Pre-sampled Exemplar Basesmentioning
confidence: 84%
“…The aforementioned NMF algorithms can be run on magnitude, power, and Mel-scale spectra. As an additional transformation of the spectrogram V, a sliding window can be applied as in [7], transforming the original spectrogram V to a matrix V whose columns correspond to overlapping sequences of short-time spectra in V. This provides a contextual factorization as does NMD, but with each sliding window factorized separately; whether this approach is superior to NMD seems to depend on the application [12]. Note that openBliSSART also implements inverse operations to the aforementioned transformations of the spectrogramincluding Mel filtering and sliding window-to allow proper signal reconstruction.…”
Section: Component Separation Algorithmsmentioning
confidence: 99%
“…The different DFT window sizes considered were powers of two, ranging from 2 6 to 2 12 , or 8-256 ms assuming 16 kHz sample rate. We evaluated both the RTF for both CPU and GPU computation, taking the elapsed computation time over the length of the mixed signals.…”
Section: Benchmark Performances In Supervised Speech Separationmentioning
confidence: 99%
“…Using exemplars in a sparse representation (SR) formulation provides significantly improved noise robustness and exemplarbased sparse representations have been successfully used for feature extraction, speech enhancement and noise robust speech recognition tasks [17][18][19][20]. These approaches model the acoustics using fixed length exemplars which are labeled at frame level and stored in the columns of a single overcomplete dictionary.…”
Section: Introductionmentioning
confidence: 99%