ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414265
|View full text |Cite
|
Sign up to set email alerts
|

Self-Attention Generative Adversarial Network for Speech Enhancement

Abstract: Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(12 citation statements)
references
References 26 publications
(54 reference statements)
0
12
0
Order By: Relevance
“…Although the parameters of the classic Attention-Wave-U-Net [14] and the causal DEMUCS [15] are less than that of SECS-U-Net, the scores of metrics in Tables 3 and 4 show that their overall enhancement performance is inferior to SECS-U-Net. Furthermore, it also indicates that the parameters of our model only account for 29.99% of that of the recently proposed SASEGAN [7]. In addition, compared with the latest Sinc-SEGAN [9], the parameters of SECS-U-Net are 32.99% of its.…”
Section: Parameter Comparisons Of Different Methodsmentioning
confidence: 79%
See 3 more Smart Citations
“…Although the parameters of the classic Attention-Wave-U-Net [14] and the causal DEMUCS [15] are less than that of SECS-U-Net, the scores of metrics in Tables 3 and 4 show that their overall enhancement performance is inferior to SECS-U-Net. Furthermore, it also indicates that the parameters of our model only account for 29.99% of that of the recently proposed SASEGAN [7]. In addition, compared with the latest Sinc-SEGAN [9], the parameters of SECS-U-Net are 32.99% of its.…”
Section: Parameter Comparisons Of Different Methodsmentioning
confidence: 79%
“…Table 3 shows the scores of speech quality metrics in the SECS-U-Net and the reference models on the open test set [30]. In Table 3, we reproduced Wiener [3], SEGAN [4], and SASEGAN [7]. The scores of other models from their papers.…”
Section: Metrics Comparisons Of Different Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In [48], the authors investigated a multi-head attention network to estimate linear prediction coefficients (LPC) for clean and noisy speech signals. In [49][50][51][52], the authors employed convolutional neural network (CNN)-and generative adversarial network (GAN)-based architectures with attention mechanisms for end-to-end ASE applications.…”
Section: Introductionmentioning
confidence: 99%