2023
DOI: 10.1016/j.compeleceng.2022.108534
|View full text |Cite
|
Sign up to set email alerts
|

VStyclone: Real-time Chinese voice style clone

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 10 publications
0
1
0
Order By: Relevance
“…ZSE-VITS (zero-shot expressive variation inference with adversarial learning for end-to-end Text-to-Speech) made use of prosody information prediction and prosody fusion methods to improve the synthesis model performance [25]. The VStyclone model adopted a tone extractor for style extraction [26]. Chen et al proposed a neural fusion architecture, which included a text encoder, an acoustic decoder and a phoneme-level reference encoder.…”
Section: Introductionmentioning
confidence: 99%
“…ZSE-VITS (zero-shot expressive variation inference with adversarial learning for end-to-end Text-to-Speech) made use of prosody information prediction and prosody fusion methods to improve the synthesis model performance [25]. The VStyclone model adopted a tone extractor for style extraction [26]. Chen et al proposed a neural fusion architecture, which included a text encoder, an acoustic decoder and a phoneme-level reference encoder.…”
Section: Introductionmentioning
confidence: 99%