Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-11223
|View full text |Cite
|
Sign up to set email alerts
|

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset

Abstract: Cross-speaker style transfer aims to extract the speech style of the given reference speech, which can be reproduced in the timbre of arbitrary target speakers. Existing methods on this topic have explored utilizing utterance-level style labels to perform style transfer via either global or local scale style representations. However, audiobook datasets are typically characterized by both the local prosody and global genre, and are rarely accompanied by utterance-level style labels. Thus, properly transferring … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 15 publications
0
1
0
Order By: Relevance
“…Lately a solution has been sought in speech style transfer, which means transferring the style from one signal to another while preserving the latter's content and speaker's identity. A small expressive speech corpus, possibly one with multiple speakers, can be used to train a model of the desired speech style which can then be applied to synthesized neutral speech (see Gao et al, 2019;Kulkarni et al, 2021;Pan and He, 2021;Li et al, 2022;Ribeiro et al, 2022).…”
Section: Introductionmentioning
confidence: 99%
“…Lately a solution has been sought in speech style transfer, which means transferring the style from one signal to another while preserving the latter's content and speaker's identity. A small expressive speech corpus, possibly one with multiple speakers, can be used to train a model of the desired speech style which can then be applied to synthesized neutral speech (see Gao et al, 2019;Kulkarni et al, 2021;Pan and He, 2021;Li et al, 2022;Ribeiro et al, 2022).…”
Section: Introductionmentioning
confidence: 99%