2021
DOI: 10.48550/arxiv.2106.12271
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

Xiaoyu Bie,
Simon Leglaive,
Xavier Alameda-Pineda
et al.

Abstract: Dynamical variational auto-encoders (DVAEs) are a class of deep generative models with latent variables, dedicated to time series data modeling. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include the modeling of temporal dependencies between successive observed and/or latent vectors in data sequences. Previous work has shown the interest of DVAEs and their better performance over the VAE for speech signals (spectrogram) modeling. Independently, the VAE has been successfully… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 45 publications
0
3
0
Order By: Relevance
“…Thus, they are trained solely to generate clean speech and are therefore considered more robust to different acoustic environments compared to their discriminative counterparts. In fact, generative approaches have shown to perform better under mismatched training and test conditions [8,11,12,13]. However, they are currently less studied and still lag behind discriminative approaches, which is a strong incentive to conduct more research to realize their full potential.…”
Section: Forward Processmentioning
confidence: 99%
“…Thus, they are trained solely to generate clean speech and are therefore considered more robust to different acoustic environments compared to their discriminative counterparts. In fact, generative approaches have shown to perform better under mismatched training and test conditions [8,11,12,13]. However, they are currently less studied and still lag behind discriminative approaches, which is a strong incentive to conduct more research to realize their full potential.…”
Section: Forward Processmentioning
confidence: 99%
“…The interest of the present paper is rather to advance on the understanding of deep generative modeling of speech signals, while comparing honestly with highly-specialized traditional systems for which signal models are generally specifically designed for the task at hand. Moreover, advancing on the interpretability and control of the VAE latent space could be beneficial for downstream tasks, for instance to develop pitch-informed extensions of VAE-based speech enhancement methods such as those of Bando et al (2018); Leglaive et al (2018Leglaive et al ( , 2020; Bie et al (2021).…”
Section: Introductionmentioning
confidence: 99%
“…Although these approaches have been proven to provide superior noise-free utterances in stationary noisy environments, they perform poorly in estimating noise with non-stationary statistics. To tackle this issue, several researchers have used deep-learning-based models to develop an unsupervised SE [23,24,25,26]. One example is Cycle-consistent GAN (CycleGAN), which was originally proposed for unpaired image-to-image translations [27] and has been successfully applied to voice conversion [28] and ASR [29].…”
Section: Introductionmentioning
confidence: 99%