2019
DOI: 10.48550/arxiv.1902.04891
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks

Abstract: Deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling. In this paper we propose several improvements of TCN for end-to-end approach to monaural speech separation, which consists of 1) multi-scale dynamic weighted gated dilated convolutional pyramids network (FurcaPy), 2) gated TCN with intra-parallel convolutional components (FurcaPa), 3) weight-shared multi-scale gated TCN (FurcaSh), 4) dilated TCN with gated difference-convolutional component (FurcaSu),… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(14 citation statements)
references
References 32 publications
0
14
0
Order By: Relevance
“…The resulting weights can be applied to a reconstruction set of basis functions and summed together along the same sliding window to reconstruct the signal under a similar paradigm to overlap-and-add for the STFT. For internal masking, we evaluate both bi-directional long short-term memory (BLSTM) networks (the typical internals of earlier deep learning-based speech separation systems [1][2][3][4]11,15]) and temporal convolutional networks (TCN) [16] with dilated convolutions (popular in recent state-of-the-art separation techniques [5,6]).…”
Section: Network Configurationsmentioning
confidence: 99%
See 1 more Smart Citation
“…The resulting weights can be applied to a reconstruction set of basis functions and summed together along the same sliding window to reconstruct the signal under a similar paradigm to overlap-and-add for the STFT. For internal masking, we evaluate both bi-directional long short-term memory (BLSTM) networks (the typical internals of earlier deep learning-based speech separation systems [1][2][3][4]11,15]) and temporal convolutional networks (TCN) [16] with dilated convolutions (popular in recent state-of-the-art separation techniques [5,6]).…”
Section: Network Configurationsmentioning
confidence: 99%
“…Great advancements have been made in recent years on solving the speech separation problem through deep learning-based techniques [1][2][3][4][5][6]. However, the overwhelming majority of research conducted thus far has used the wsj0-2mix dataset [1], which consists of synthetically-mixed studio recordings of read utterances from the WSJ0 corpus [7] and is not representative of many real-world scenarios in which overlapped speech may be present [8].…”
Section: Introductionmentioning
confidence: 99%
“…Recent progress in deep learning-based speech separation has ignited the interest of the research community in time-domain approaches [1][2][3][4][5][6]. Compared with standard time-frequency domain methods, time-domain methods are designed to jointly model the magnitude and phase information and allow direct optimization with respect to both time-and frequency-domain differentiable criteria [7][8][9].…”
Section: Introductionmentioning
confidence: 99%
“…The adaptive front-end approaches aim at replacing the short-time Fourier transform (STFT) with a differentiable transform to build a front-end that can be learned jointly with the separation network. Separation is applied to the front-end output as with the conventional time-frequency domain methods applying the separation processes to spectrogram inputs [3][4][5]. Being independent of the traditional time-frequency analysis paradigm, these systems are able to have a much more flexible choice on the window size and the number of basis functions for the front-end.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation