Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1693
|View full text |Cite
|
Sign up to set email alerts
|

Vector-Quantized Neural Networks for Acoustic Unit Discovery in the ZeroSpeech 2020 Challenge

Abstract: In this paper, we explore vector quantization for acoustic unit discovery. Leveraging unlabelled data, we aim to learn discrete representations of speech that separate phonetic content from speaker-specific details. We propose two neural models to tackle this challenge. Both models use vector quantization to map continuous features to a finite set of codes. The first model is a type of vector-quantized variational autoencoder (VQ-VAE). The VQ-VAE encodes speech into a discrete representation from which the aud… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
73
2

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 89 publications
(75 citation statements)
references
References 29 publications
0
73
2
Order By: Relevance
“…In this section, the state-of-the-art VQ-VAE method [18] is first described, and then we introduce the proposed LLTs model.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this section, the state-of-the-art VQ-VAE method [18] is first described, and then we introduce the proposed LLTs model.…”
Section: Methodsmentioning
confidence: 99%
“…For the baseline VQ-VAE method, we chose the same network architecture in [18], which is a quite competitive model in VQ-VAE based voice conversion. The network architecture of the proposed LLTs method is illustrated in Fig 3, which is similar to the baseline model except for the multi-head attention module.…”
Section: Network Architecture and Implementation Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…To this end an optimal cluster assignment problem was solved for each training sample. This approach was refined in [18] for phoneme segmentation in a VQ-VAE and VQ-CPC [19]. Finally, [20] have proposed to use slowness penalty and run-length encoding of the latent representation of a VQ-VAE.…”
Section: Related Workmentioning
confidence: 99%
“…To ease the burden of collecting parallel utterances, nonparallel VC has been developed. There are two major nonparallel VC techniques, namely phonetic posteriorgram (PPG)-based VC methods [6,5] and autoencoderbased VC methods including those using the variational autoencoder (VAE) [7,8,9,10], vector-quantized VAE (VQ-VAE) [11,12,13,14,15], and generative adversarial network (GAN) [16,17,18,19]. For the PPG-based VC method, the PPG vector is first estimated using a preliminarily trained automatic speech recognition (ASR) system.…”
Section: Introductionmentioning
confidence: 99%