Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1182
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 20 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Selfsupervised speech models can produce continuous features that accurately capture phonetic contrasts while being invariant to nuisance factors (e.g. speaker) [12], [28]- [30]. To obtain discrete units, a self-supervised model can be coupled with a vector quantization (VQ) module, either as part of the model itself or by introducing a clustering step after training [29]- [34].…”
Section: Dpdp For Acoustic Unit Discoverymentioning
confidence: 99%
“…Selfsupervised speech models can produce continuous features that accurately capture phonetic contrasts while being invariant to nuisance factors (e.g. speaker) [12], [28]- [30]. To obtain discrete units, a self-supervised model can be coupled with a vector quantization (VQ) module, either as part of the model itself or by introducing a clustering step after training [29]- [34].…”
Section: Dpdp For Acoustic Unit Discoverymentioning
confidence: 99%
“…System BN [25] begins with the baseline CPC representations and applies speaker normalization before re-running kmeans. An LSTM language model architecture is used.…”
Section: Submitted Systems and Resultsmentioning
confidence: 99%
“…Then, an SSRL algorithm called contrastive predictive coding (CPC) [17] is applied to the data. CPC aims to produce linearly separable features that can be used to predict signal evolution over time [20,21]. CPC has already been successfully used with clustering-based approaches (e.g.…”
Section: Methodsmentioning
confidence: 99%
“…CPC has already been successfully used with clustering-based approaches (e.g. [21][22][23]) and also produces features that separate suprasegmental properties such as speaker identities [17]. However, to the best of our knowledge, [24] is the only study so far using CPC for AL.…”
Section: Methodsmentioning
confidence: 99%