A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders

Pariente, Manuel; Deleforge, Antoine

doi:10.21437/interspeech.2019-1398

Cited by 21 publications

(18 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To reduce the computational cost, we previously proposed to exploit the pretrained encoder of a CVAE as an approximate posterior estimator to infer the latent space variable z in [1]. With the same motivation, a fast algorithm for estimating the parameters of the VAE-NMF model was later derived based on the Bayesian inference in [38] for single-channel speech enhancement.…”

Section: Vae-based Methodsmentioning

confidence: 99%

FastMVAE: A Fast Optimization Algorithm for the Multichannel Variational Autoencoder Method

Kameoka

Inoue

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Section: Vae-based Methodsmentioning

confidence: 99%

FastMVAE: A Fast Optimization Algorithm for the Multichannel Variational Autoencoder Method

Kameoka

Inoue

et al. 2020

IEEE Access

View full text Add to dashboard Cite

“…where ϕ FFNN enc (⋅ ; θenc) ∶ C F ↦ R L × R L + denotes the output of an FFNN. Such an architecture was used in [8,9,10,11,12,13,14]. This is the only case where, from the approximate posterior, we can sample all latent vectors in parallel for all time frames, without further approximation.…”

Section: Trainingmentioning

confidence: 99%

A Recurrent Variational Autoencoder for Speech Enhancement

Leglaive

Alameda-Pineda

Girin

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectationmaximization algorithm where the encoder of the RVAE is finetuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.Index Terms-Speech enhancement, recurrent variational autoencoders, nonnegative matrix factorization, variational inference.

show abstract

“…for all TF bins (f, n). Similarly as done in the previous works [5][6][7][8][9][10], we use an unsupervised NMF-based Gaussian noise model that assumes independence across TF bins:…”

Section: Vae-mm Inference and Learningmentioning

confidence: 99%

Robust Unsupervised Audio-Visual Speech Enhancement Using a Mixture of Variational Autoencoders

Sadeghi

Alameda-Pineda

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Recently, an audio-visual speech generative model based on variational autoencoder (VAE) has been proposed, which is combined with a nonnegative matrix factorization (NMF) model for noise variance to perform unsupervised speech enhancement. When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data. However, audio-visual VAE is not robust against noisy visual data, e.g., when for some video frames, speaker face is not frontal or lips region is occluded. In this paper, we propose a robust unsupervised audio-visual speech enhancement method based on a per-frame VAE mixture model. This mixture model consists of a trained audio-only VAE and a trained audio-visual VAE. The motivation is to skip noisy visual frames by switching to the audio-only VAE model. We present a variational expectation-maximization method to estimate the parameters of the model. Experiments show the promising performance of the proposed method.

show abstract

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement Using Variational Autoencoders

Cited by 21 publications

References 24 publications

FastMVAE: A Fast Optimization Algorithm for the Multichannel Variational Autoencoder Method

FastMVAE: A Fast Optimization Algorithm for the Multichannel Variational Autoencoder Method

A Recurrent Variational Autoencoder for Speech Enhancement

Robust Unsupervised Audio-Visual Speech Enhancement Using a Mixture of Variational Autoencoders

Contact Info

Product

Resources

About