2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854660
|View full text |Cite
|
Sign up to set email alerts
|

Impact of single-microphone dereverberation on DNN-based meeting transcription systems

Abstract: Over the past few decades, a range of front-end techniques have been proposed to improve the robustness of automatic speech recognition systems against environmental distortion. While these techniques are effective for small tasks consisting of carefully designed data sets, especially when used with a classical acoustic model, there has been limited evidence that they are useful for a state-of-theart system with large scale realistic data. This paper focuses on reverberation as a type of distortion and investi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…Approaches to noise-robust speech recognition can generally be classified into two classes: front-end based and back-end based [1]. The front-end based approaches aim at removing distortions from the observations prior to recognition, and can either take place in time domain, spectral domain, or directly from the corrupted feature vectors [2,3]. The back-end approaches This work was supported by Samsung Electronics Co. Ltd, South Korea, under the project "Acoustic Model Adaptation toward Spontaneous Speech and Environment".…”
Section: Introductionmentioning
confidence: 99%
“…Approaches to noise-robust speech recognition can generally be classified into two classes: front-end based and back-end based [1]. The front-end based approaches aim at removing distortions from the observations prior to recognition, and can either take place in time domain, spectral domain, or directly from the corrupted feature vectors [2,3]. The back-end approaches This work was supported by Samsung Electronics Co. Ltd, South Korea, under the project "Acoustic Model Adaptation toward Spontaneous Speech and Environment".…”
Section: Introductionmentioning
confidence: 99%
“…We apply de-reverberation based on the Weighted Prediction Error (WPE) algorithm [14,15] as front-end processing. This method is based on robust blind deconvolution using longterm linear prediction, with the motive of reducing the effects of the late reverberation.…”
Section: Wpe De-reverberationmentioning
confidence: 99%
“…As obtaining the actual noisy data is costly, the training data is artificially corrupted with reverberation and noise of different profiles. On the other hand, speech enhancement methods are used to reduce the interference in the speech signal either by de-reverberation [14,15,16] or noise reduction [17,13]. Moreover, the speech features can be engineered to alleviate the sensitivity to the recording environment [18,19,20], typically replacing the traditional nonlinearity in the mel scale with another power-law non-linearity, e.g.…”
Section: Introductionmentioning
confidence: 99%
“…These approaches however can not be directly applied to DNNs because of the different structure of modeling parameters. Nevertheless, there have been some investigations of using feature-domain transform-based approaches such as feature-space MLLR (fMLLR) applied to DNNs [12,13,14]. Apart from speaker variabilities, variations in the audio recording process such as reverberations, speaker-to-microphone distance (e.g., close-talk or far-field), or recording devices can lead to significant differences in acoustic patterns.…”
Section: Introductionmentioning
confidence: 99%