Findings of the Association for Computational Linguistics: EMNLP 2023 2023
DOI: 10.18653/v1/2023.findings-emnlp.541
|View full text |Cite
|
Sign up to set email alerts
|

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units

Gallil Maimon,
Yossi Adi

Abstract: We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice conversion (VC) methods focus primarily on timbre, and ignore people's unique speaking style (prosody). The proposed approach uses a pretrained, self-supervised model for encoding speech to discrete units, which makes it simple, effective, and fast to train. All conversion modules are only trained on reconstruction like tasks, thus … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 27 publications
(46 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?