ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682830
|View full text |Cite
|
Sign up to set email alerts
|

Using Recurrences in Time and Frequency within U-net Architecture for Speech Enhancement

Abstract: When designing fully-convolutional neural network, there is a trade-off between receptive field size, number of parameters and spatial resolution of features in deeper layers of the network. In this work we present a novel network design based on combination of many convolutional and recurrent layers that solves these dilemmas. We compare our solution with Unets based models known from the literature and other baseline models on speech enhancement task. We test our solution on TIMIT speech utterances combined … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 19 publications
(19 reference statements)
0
9
0
Order By: Relevance
“…The FGRU and TGRU used in TRU-Net is similar to the work in [32]. They used bidirectional long short-term memory (bi-LSTM) networks on the frequency-axis and the time-axis combined with 2D-CNN-based U-Net.…”
Section: Relation To Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The FGRU and TGRU used in TRU-Net is similar to the work in [32]. They used bidirectional long short-term memory (bi-LSTM) networks on the frequency-axis and the time-axis combined with 2D-CNN-based U-Net.…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…They used bidirectional long short-term memory (bi-LSTM) networks on the frequency-axis and the time-axis combined with 2D-CNN-based U-Net. The difference is that bi-LSTM was utilized to increase performance in [32], whereas we employ FGRU and uni-directional TGRU to better handle the online inference scenario combined with the proposed lightweight 1D-CNN-based (frequency-axis) U-Net.…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Considering that many state-of-the-art CNN speech enhancement methods have shortcuts [24], [25], we also investigate the effect of the proposed SE mechanisms when combined with shortcut-based CNNs. We build a shortcutbased convolutional network (SCN) that adds shortcuts between the corresponding layers in the encoder and decoder of the CNN used before and then evaluate its performance with and without SE mechanisms.…”
Section: Se Generalizationmentioning
confidence: 99%
“…Tan and Wang [24] proposed the convolutional recurrent network (CRN), which inserted two long short-term memory (LSTM) layers between the encoder and the decoder of the FCN. Grzywalski and Drgas [25] added gated recurrent unit (GRU) layers into each building block of the FCN. These models improve the representational capability by exploiting the temporal modeling capability of RNN.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation