ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053544
|View full text |Cite
|
Sign up to set email alerts
|

Residual Recurrent Neural Network for Speech Enhancement

Abstract: Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss. Previous work has overcome these issues by using convolutional networks to learn long-range temporal correlations across high-resolution waveforms. These models, however, are limited by memory-intensive dilated convolution and aliasing artifacts from upsampling. We introduce an end-to-end fully-recurrent hourglass-shaped neural network architecture with residual connect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…The STFT loss is introduced in [ 34 ]. There are examples that use log-cosh as the loss function for the speech enhancement model, one built with GRU [ 41 ] and the other built with SRU [ 20 ]. We creatively used one of the evaluation metrics, specifically SNR, as a term in the loss function.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The STFT loss is introduced in [ 34 ]. There are examples that use log-cosh as the loss function for the speech enhancement model, one built with GRU [ 41 ] and the other built with SRU [ 20 ]. We creatively used one of the evaluation metrics, specifically SNR, as a term in the loss function.…”
Section: Methodsmentioning
confidence: 99%
“…The STFT loss is introduced in [34]. There are examples that use log-cosh as the loss function for the speech enhancement model, one built with GRU [41] and the other built with SRU [20].…”
Section: Loss Functionmentioning
confidence: 99%
“…The study [ 31 ] presents a completely E2E recurrent neural network (RNN) for enhancing single-channel speech. By lowering the feature resolution without sacrificing the information, an hourglass-shaped network effectively captured long-range temporal correlation.…”
Section: Related Studiesmentioning
confidence: 99%
“…The feed-forward network estimates the compatibility function with a single hidden layer (Vaswani et al 2017). Different from the conventional Transformer encoder, in the feed-forward network, the first fully connected layer is replaced by a gated recurrent unit (GRU) (Abdulbaqi et al 2020) layer because the GRU has a simpler structure and trains faster than local LSTMs. Moreover, the same dimension of attentions maps is obtained from the input and output of one sub-layer in the U-Transformer, e.g., d layer = 512 in the first sub-layer, to facilitate the residual connections.…”
Section: Speech Enhancement U-transformermentioning
confidence: 99%