ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414605
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing into the Codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders

Abstract: Audio codecs based on discretized neural autoencoders have recently been developed and shown to provide significantly higher compression levels for comparable quality speech output. However, these models are tightly coupled with speech content, and produce unintended outputs in noisy conditions. Based on VQ-VAE autoencoders with WaveRNN decoders, we develop compressor-enhancer encoders and accompanying decoders, and show that they operate well in noisy conditions. We also observe that a compressor-enhancer mod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…Another aspect of content privacy is related to speech codecs and compression. A new generative DNN architecture was introduced in [14] and [15], independently, with different speech technologies in mind. The architecture is a dual-encoder vector quantized variational autoencoder (VQ-VAE) that learns to disentangle speech content and speaker identity information in the speech signal while simultaneously creating a discrete and compressed representation of the speech.…”
Section: Recent Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Another aspect of content privacy is related to speech codecs and compression. A new generative DNN architecture was introduced in [14] and [15], independently, with different speech technologies in mind. The architecture is a dual-encoder vector quantized variational autoencoder (VQ-VAE) that learns to disentangle speech content and speaker identity information in the speech signal while simultaneously creating a discrete and compressed representation of the speech.…”
Section: Recent Workmentioning
confidence: 99%
“…The architecture is a dual-encoder vector quantized variational autoencoder (VQ-VAE) that learns to disentangle speech content and speaker identity information in the speech signal while simultaneously creating a discrete and compressed representation of the speech. In [14] the goal was to use VQ-VAE to compress the speech signal and enhance it by removing unwanted noise. They measured compression rate as bits/sec as well as human judgements of the enhanced speech naturalness.…”
Section: Recent Workmentioning
confidence: 99%
See 1 more Smart Citation