DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement

Le, Xiaohuai; Chen, Hongsheng; Chen, Kai; Lü, Jing

doi:10.21437/interspeech.2021-296

Cited by 43 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Both the encoder and decoder are comprised of 2D causal convolution, batch normalization [16], and PReLU [17]. Between the encoder and decoder, DPRNN [12,13] is inserted to model the multidimensional dependencies and Skip Connection concatenates the output of each encoder to the input of the corresponding decoder (red line in Fig. 1).…”

Section: Coarse Enhancement Modulementioning

confidence: 99%

“…The HB spectrum is enhanced by a lightweight NSNet [2] and the WB spectrum is enhanced by an HGCN that is updated in the following aspects. 1) The dual-path encoder and DPRNN [12,13] are introduced to take full advantage of the features. 2) Cosine is adopted to model the harmonic peak-valley structure, and the voiced region detection (VRD) is judged based on the harmonic integration significance.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Harmonic gated compensation network plus for ICASSP 2022 DNS CHALLENGE

Wang¹,

Zhu²,

Gao³

et al. 2022

Preprint

View full text Add to dashboard Cite

The harmonic structure of speech is resistant to noise, but the harmonics may still be partially masked by noise. Therefore, we previously proposed a harmonic gated compensation network (HGCN) to predict the full harmonic locations based on the unmasked harmonics and process the result of a coarse enhancement module to recover the masked harmonics. In addition, the auditory loudness loss function is used to train the network. For the DNS Challenge, we update HGCN with the following aspects, resulting in HGCN+. First, a high-band module is employed to help the model handle full-band signals. Second, cosine is used to model the harmonic structure more accurately. Then, the dual-path encoder and dual-path rnn (DPRNN) are introduced to take full advantage of the features. Finally, a gated residual linear structure replaces the gated convolution in the compensation module to increase the receptive field of frequency. The experimental results show that each updated module brings performance improvement to the model. HGCN+ also outperforms the referenced models on both wide-band and full-band test sets.

show abstract

Section: Coarse Enhancement Modulementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Harmonic gated compensation network plus for ICASSP 2022 DNS CHALLENGE

Wang¹,

Zhu²,

Gao³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For phase retrieval, the complex ratio mask (CRM) is a widely used training target [25] which is denoted as a complex value M c (t, f ). In [22], we used CRM to recover the phase implicitly and the denoising process can be expressed as the complex product of the mask and the noisy speech as…”

Section: A Problem Formulationmentioning

confidence: 99%

“…We use 5 2-D convolutional layers in the encoder and set the strides as {(2,1), (2,1), (2,1), (1,1), (1,1)}. Note that the strides in the last two convolutional layers are (1,1) for sufficient frequency resolution of the features fed into the DPRNN module, which we found is important for speech quality [22]. We use two DPRNN modules and the resulting baseline DPCRN has 0.53 M trainable parameters.…”

Section: A Problem Formulationmentioning

confidence: 99%

“…We find that the interruption of the mask updates can be effectively circumvented in a network with parallel RNNs when applying the Skip-RNN strategy. Parallel RNNs have been used in various high performance SE models such as the convolutional U-net for speech enhancement (CRUSE) [12], [21], the gated convolutional recurrent network (GCRN) [11] and the dual-path convolutional recurrent network (DPCRN) [22]. In this paper, we take DPCRN as an example to discuss the application of Skip-RNN, which can be easily extended to CRUSE and GCRN.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Inference skipping for more efficient real-time speech enhancement with parallel RNNs

Le,

Lei,

Chen

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep neural network (DNN) based speech enhancement models have attracted extensive attention due to their promising performance. However, it is difficult to deploy a powerful DNN in real-time applications because of its high computational cost. Typical compression methods such as pruning and quantization do not make good use of the data characteristics. In this paper, we introduce the Skip-RNN strategy into speech enhancement models with parallel RNNs. The states of the RNNs update intermittently without interrupting the update of the output mask, which leads to significant reduction of computational load without evident audio artifacts. To better leverage the difference between the voice and the noise, we further regularize the skipping strategy with voice activity detection (VAD) guidance, saving more computational load. Experiments on a high-performance speech enhancement model, dual-path convolutional recurrent network (DPCRN), show the superiority of our strategy over strategies like network pruning or directly training a smaller model. We also validate the generalization of the proposed strategy on two other competitive speech enhancement models.

show abstract