2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2018
DOI: 10.1109/iscslp.2018.8706595
|View full text |Cite
|
Sign up to set email alerts
|

Two-Stage Enhancement of Noisy and Reverberant Microphone Array Speech for Automatic Speech Recognition Systems Trained with Only Clean Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 37 publications
0
8
0
Order By: Relevance
“…We avoid giving the proof of our new results here, but the experimental evidence given in the next section on the speech enhancement task supports our theoretical results in Eq. (2).…”
Section: Tensor-to-vector Regressionmentioning
confidence: 99%
See 1 more Smart Citation
“…We avoid giving the proof of our new results here, but the experimental evidence given in the next section on the speech enhancement task supports our theoretical results in Eq. (2).…”
Section: Tensor-to-vector Regressionmentioning
confidence: 99%
“…Deep neural network (DNN) based speech enhancement [1] has demonstrated state-of-the-art performances in a single-channel setting. It has also been extended to multi-channel speech enhancement with similar high-quality enhanced speech [2]. A recent overview can be found in [3].…”
Section: Introductionmentioning
confidence: 99%
“…For both training and test datasets, the setting of RIRs was fixed to the same conditions, such as the room size, RT60, and all of the distances and directions. Additional detail about the data simulation procedure can be found in [3,17].…”
Section: Data Preparationmentioning
confidence: 99%
“…The state-of-the-art speech enhancement systems are commonly built with deep neural network (DNN) based vector-to-vector regression models, where inputs are context-dependent log power spectrum (LPS) features of noisy speech and outputs correspond to either clean or enhanced LPS features. Although deep neural network (DNN) based speech enhancement [1,2] has demonstrated the state-ofthe-art performance under a single-channel setting, it can also be extended to scenarios of multi-channel speech enhancement with even better-enhanced speech qualities [3]. The process of both single and multi-channel speech enhancement can be taken as a DNN based vector-to-vector regression aiming at bridging a functional relationship f : Y → X such that the input noisy speech y ∈ Y can be mapped to the corresponding clean speech x ∈ X.…”
Section: Introductionmentioning
confidence: 99%
“…In [17,18], a unified DNN-based SD speech separation and enhancement system was proposed to jointly handle both background noise and interfering speech, where the speaker-specific data used for DNN training is about 2 hours. In [19], a two-stage approach was proposed for SD enhancement of far-field microphone array speech collected in reverberant conditions corrupted by interfering speakers and noises, where 5 minutes of speakerspecific data is used. [20] adopted more than 5 minutes of speaker-specific data to train a two-stage single-channel SD speech separation system.…”
Section: Introductionmentioning
confidence: 99%