ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746020
|View full text |Cite
|
Sign up to set email alerts
|

Uformer: A Unet Based Dilated Complex & Real Dual-Path Conformer Network for Simultaneous Speech Enhancement and Dereverberation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
14
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(17 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…The work [17] first applies the realvalued conformer to a two-stage modelling scheme. An extended work was proposed in [19] for a semi-complex model. In this work, we develop a fully complex model with a much simpler architecture.…”
Section: Complex Dual-path Conformer Blockmentioning
confidence: 99%
See 2 more Smart Citations
“…The work [17] first applies the realvalued conformer to a two-stage modelling scheme. An extended work was proposed in [19] for a semi-complex model. In this work, we develop a fully complex model with a much simpler architecture.…”
Section: Complex Dual-path Conformer Blockmentioning
confidence: 99%
“…The outputs of the two decoders are weighted and summed. erate on the raw waveform of speech signals and the time-frequency (TF) domain approaches [10][11][12][13][14][15][16][17][18][19][20][21] that manipulate the speech spectrogram are proposed. Although the time-domain approaches have made some success, the TF domain approach has dominated the research trend.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In recent years, there has been a similar trend of conventional speech dereverberation approaches [24]- [27] such as WPE evolving into their current DNN based variants. These include: a) the DNN-WPE [22], [23] method, which uses neural network estimated target signal PSD matrices in place of those traditionally obtained using maximum likelihood estimation trained complex value Gaussian Mixture Models [24] in the dereverberation filter estimation; and b) complex spectral masking [28], [29] and spectral mapping [30], [31] learning a transformation between reverberant and anechoic data.…”
Section: Introductionmentioning
confidence: 99%
“…A complex-valued U-Net was proposed [11], causing attention to shift to phase-aware networks [12]. Showing state-of-the-art performance, complexvalued approaches were further developed [13], including transformer-based U-Networks [14]. The U-Network structure is similar to an autoencoder, yet the use of probabilistic latent space models similar to variational autoencoders, was previously only used for image segmentation tasks [15].…”
mentioning
confidence: 99%