ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414062
|View full text |Cite
|
Sign up to set email alerts
|

ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network

Abstract: It remains a tough challenge to recover the speech signals contaminated by various noises under real acoustic environments. To this end, we propose a novel system for denoising in the complicated applications, which is mainly comprised of two pipelines, namely a two-stage network and a post-processing module. The first pipeline is proposed to decouple the optimization problem w.r.t. magnitude and phase, i.e., only the magnitude is estimated in the first stage and both of them are further refined in the second … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(22 citation statements)
references
References 29 publications
1
21
0
Order By: Relevance
“…2, where more noise is suppressed when CP is applied. Finally, when post-processing is applied, PESQ is decreased due to some spectrum information lost [33]. However the use of post-processing is beneficial to subjective listening as shown in previous works [27,33] because unnatural residual noise is further suppressed.…”
Section: Resultsmentioning
confidence: 94%
“…2, where more noise is suppressed when CP is applied. Finally, when post-processing is applied, PESQ is decreased due to some spectrum information lost [33]. However the use of post-processing is beneficial to subjective listening as shown in previous works [27,33] because unnatural residual noise is further suppressed.…”
Section: Resultsmentioning
confidence: 94%
“…To ensure consistency in the optimization of the RI and magnitude spectrum, we adopt the loss function form of combined mean square error (cMSE) in [2], as follows:…”
Section: Loss Functionmentioning
confidence: 99%
“…However, most of the previous studies on speech enhancement are for narrow-band (8 kHz) or wide-band (16 kHz) audio, and there are few methods for 48 kHz full-band audio. Deep learning-based speech enhancement methods [1,2,3] have achieved impressive performance on wide-band audio, but the lack of sufficient training data has become a major limitation for full-band deep learning speech enhancement methods. The recent 4th Microsoft * Equal contribution Deep Noise Suppression (DNS-4) Challenge 1 extends efforts to full-band single-channel speech enhancement tasks with a massive training dataset and real-scenario test set.…”
Section: Introductionmentioning
confidence: 99%
“…However, they are targeted at teleconferencing scenarios, where a processing latency as large as 40 ms is allowed. For example, DCCRN [14] has an algorithmic latency of 62.5 ms and TSCN-PP [53] 20 ms. In addition, these models share many similarities with our complex T-F domain DNN models and can straightforwardly leverage our proposed techniques to reduce their algorithmic latency.…”
Section: B Benchmark Systemsmentioning
confidence: 99%