2021
DOI: 10.48550/arxiv.2106.15813
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

Abstract: Single-channel speech enhancement (SE) is an important task in speech processing. A widely used framework combines an analysis/synthesis filterbank with a mask prediction network, such as the Conv-TasNet architecture. In such systems, the denoising performance and computational efficiency are mainly affected by the structure of the mask prediction network. In this study, we aim to improve the sequential modeling ability of Conv-TasNet architectures by integrating Conformer layers into a new mask prediction net… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 28 publications
(54 reference statements)
0
1
0
Order By: Relevance
“…Transformers have also been applied to some audio modeling tasks. For example, Transformerbased models have been used for audio classification (Gong et al, 2021;Verma & Berger, 2021), captioning (Mei et al, 2021), compression (Dieleman et al, 2021, speech recognition (Gulati et al, 2020), speaker separation (Subakan et al, 2021), and enhancement (Koizumi et al, 2021). Transformers have also been used for generative audio models Verma & Chafe, 2021), which in turn have enabled further tasks in music understanding (Castellon et al, 2021).…”
Section: Transformers For Sequence Modelingmentioning
confidence: 99%
“…Transformers have also been applied to some audio modeling tasks. For example, Transformerbased models have been used for audio classification (Gong et al, 2021;Verma & Berger, 2021), captioning (Mei et al, 2021), compression (Dieleman et al, 2021, speech recognition (Gulati et al, 2020), speaker separation (Subakan et al, 2021), and enhancement (Koizumi et al, 2021). Transformers have also been used for generative audio models Verma & Chafe, 2021), which in turn have enabled further tasks in music understanding (Castellon et al, 2021).…”
Section: Transformers For Sequence Modelingmentioning
confidence: 99%