Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-296
|View full text |Cite
|
Sign up to set email alerts
|

DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 43 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…Both the encoder and decoder are comprised of 2D causal convolution, batch normalization [16], and PReLU [17]. Between the encoder and decoder, DPRNN [12,13] is inserted to model the multidimensional dependencies and Skip Connection concatenates the output of each encoder to the input of the corresponding decoder (red line in Fig. 1).…”
Section: Coarse Enhancement Modulementioning
confidence: 99%
See 1 more Smart Citation
“…Both the encoder and decoder are comprised of 2D causal convolution, batch normalization [16], and PReLU [17]. Between the encoder and decoder, DPRNN [12,13] is inserted to model the multidimensional dependencies and Skip Connection concatenates the output of each encoder to the input of the corresponding decoder (red line in Fig. 1).…”
Section: Coarse Enhancement Modulementioning
confidence: 99%
“…The HB spectrum is enhanced by a lightweight NSNet [2] and the WB spectrum is enhanced by an HGCN that is updated in the following aspects. 1) The dual-path encoder and DPRNN [12,13] are introduced to take full advantage of the features. 2) Cosine is adopted to model the harmonic peak-valley structure, and the voiced region detection (VRD) is judged based on the harmonic integration significance.…”
Section: Introductionmentioning
confidence: 99%
“…For phase retrieval, the complex ratio mask (CRM) is a widely used training target [25] which is denoted as a complex value M c (t, f ). In [22], we used CRM to recover the phase implicitly and the denoising process can be expressed as the complex product of the mask and the noisy speech as…”
Section: A Problem Formulationmentioning
confidence: 99%
“…We use 5 2-D convolutional layers in the encoder and set the strides as {(2,1), (2,1), (2,1), (1,1), (1,1)}. Note that the strides in the last two convolutional layers are (1,1) for sufficient frequency resolution of the features fed into the DPRNN module, which we found is important for speech quality [22]. We use two DPRNN modules and the resulting baseline DPCRN has 0.53 M trainable parameters.…”
Section: A Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation