2022 30th European Signal Processing Conference (EUSIPCO) 2022
DOI: 10.23919/eusipco55093.2022.9909855
|View full text |Cite
|
Sign up to set email alerts
|

Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

Abstract: This is a repository copy of Receptive field analysis of temporal convolutional networks for monaural speech dereverberation.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 23 publications
0
5
0
Order By: Relevance
“…All SISDR results above T lim = 1.95s are within in 0.5dB of each other suggesting Conv-TasNet is more invariant to the TSL limit if the limit is sufficiently large. This is possibly due to the 1.53s receptive field of the Conv-TasNet models being smaller than these particular TSLs limits [24].…”
Section: Transformer Vs Convolutional Modelmentioning
confidence: 97%
“…All SISDR results above T lim = 1.95s are within in 0.5dB of each other suggesting Conv-TasNet is more invariant to the TSL limit if the limit is sufficiently large. This is possibly due to the 1.53s receptive field of the Conv-TasNet models being smaller than these particular TSLs limits [24].…”
Section: Transformer Vs Convolutional Modelmentioning
confidence: 97%
“…Transformers are also typically unsuitable for continuous processing as the entire sequence is required to compute self-attention. To address these issues input signals are processed in overlapping blocks of 4s for evaluation and inference as this has been shown to be in an optimal signal length for attention-based enhancement models [18]. A 50% overlap with a Hann window is used to cross-fade each block with one an another.…”
Section: Block Processing For Longer Inputsmentioning
confidence: 99%
“…A 50% overlap with a Hann window is used to cross-fade each block with one an another. Models are trained with 4s signal length limits [18].…”
Section: Block Processing For Longer Inputsmentioning
confidence: 99%
“…The Conv-TasNet speech separation model has been widely studied and adapted for a number of speech enhancement tasks [5,[9][10][11]. Conv-TasNet generally performs very well on clean speech mixtures with a very low computational cost compared to the most performant speech separation models [6,12,13] on the WSJ0-2Mix benchmark [14].…”
Section: Introductionmentioning
confidence: 99%
“…The Conv-TasNet model uses a sequence model known as a TCN. It was recently shown that the optimal RF of TCNs in dereverberation models varies with reverberation time when the model size is sufficiently large [10]. Furthermore, it was shown that multi-dilation TCN models can be trained implicitly to weight differently dilated convolutional kernels to optimally focus within the RF on more or less temporal context according to the reverberation time in the data for dereverberation tasks [16], i.e.…”
Section: Introductionmentioning
confidence: 99%