ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747578
|View full text |Cite
|
Sign up to set email alerts
|

FRCRN: Boosting Feature Representation Using Frequency Recurrence for Monaural Speech Enhancement

Abstract: Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoderdecoder (CRED) structure to boost feature representation along the frequency axis. The CRED applies frequency recurrence on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 35 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…The CFSMN module contains a CFSMN layer to learn the long-range frequency correlations. The details of the CFSMN layer are provided in our previous work [21]. Here we use the same settings for CFSMN.…”
Section: Complex Dual-path Encodermentioning
confidence: 99%
See 2 more Smart Citations
“…The CFSMN module contains a CFSMN layer to learn the long-range frequency correlations. The details of the CFSMN layer are provided in our previous work [21]. Here we use the same settings for CFSMN.…”
Section: Complex Dual-path Encodermentioning
confidence: 99%
“…The outputs of the two decoders are weighted and summed. erate on the raw waveform of speech signals and the time-frequency (TF) domain approaches [10][11][12][13][14][15][16][17][18][19][20][21] that manipulate the speech spectrogram are proposed. Although the time-domain approaches have made some success, the TF domain approach has dominated the research trend.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The CNN module is able to extract high-level features but mainly focuses on local temporal-spectral patterns [60]. Combining their advantages, the CRN structure has been shown to be very effective for speech enhancement [58,[61][62][63]. Motivated by [58,61], we determined three convolution layers for the encoder, three transposed convolution layers for the decoder, and two LSTM layers between them.…”
Section: Structure Of Dvenmentioning
confidence: 99%
“…With a deep structure of hidden layers between input and output layers, deep learning constructs complex models for nonlinear relations and enables feature representation from the lower layers to model the complex input data. Given a speech dataset of the clean-noisy pairs, a neural model learns to transform the noisy magnitude spectra to their clean counterparts (mapping-based SE) or estimates the time-frequency masks (masking-based SE), such as the ideal binary mask (IBM) [6], [7], ideal ratio mask (IRM) [8], [9], and spectral magnitude mask (SMM) [10]. In spectral mapping, the models are trained using a direct mapping rule, where the noisy spectral features are learned to estimate the clean spectral features.…”
Section: Introductionmentioning
confidence: 99%