2020
DOI: 10.48550/arxiv.2008.00264
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

Abstract: Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolutio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
107
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 66 publications
(120 citation statements)
references
References 34 publications
0
107
0
Order By: Relevance
“…The decoder predicts a complex ratio mask M = Cat(M r , M i ) ∈ R T ×2F , where M r and M i represent the real and imaginary parts of mask. We use the mask applying scheme of DCCRN-E [3], which is called Mask Apply E in Fig. 1,…”
Section: Coarse Enhancement Modulementioning
confidence: 99%
See 1 more Smart Citation
“…The decoder predicts a complex ratio mask M = Cat(M r , M i ) ∈ R T ×2F , where M r and M i represent the real and imaginary parts of mask. We use the mask applying scheme of DCCRN-E [3], which is called Mask Apply E in Fig. 1,…”
Section: Coarse Enhancement Modulementioning
confidence: 99%
“…T models process the waveform directly to obtain the target speech [1]. T-F models precess the spectrum after the short-time fast Fourier transform (STFT) [2][3][4]. Generally speaking, for speech enhancement, it's the T-F structure of speech that is enhanced.…”
Section: Introductionmentioning
confidence: 99%
“…For example, complex multiplication capture rotation in the complex domain and can easily manipulate the signal phase. Thus, complex neural networks are found to be more effective for applications such as wireless communication (Marseet and Sahin 2017) and noise suppression (Hu et al 2020;Bassey, Qian, and Li 2021). Compared to realvalued networks, complex representation also restricts the degree of freedom of the parameters by enforcing correlation between the real and imaginary parts, which enhances the generalization capacity of the model in other applications.…”
Section: Hybridbeam Architecturementioning
confidence: 99%
“…Previous researches suggest that the complex ratio masks (CRMs) outperform both the binary masks (BMs) and real-value ratio masks (RMs) on speech separation [30,31] and enhancement [32] tasks. For this reason, the complex ideal ratio mask (cIRM) m t,f of the target speech is estimated in the separation module.…”
Section: Tf Masking Based Speech Separationmentioning
confidence: 99%