Interspeech 2022 2022
DOI: 10.21437/interspeech.2022-831
|View full text |Cite
|
Sign up to set email alerts
|

Decoupled Pronunciation and Prosody Modeling in Meta-Learning-based Multilingual Speech Synthesis

Abstract: This paper presents a method of decoupled pronunciation and prosody modeling to improve the performance of meta-learning-based multilingual speech synthesis.The baseline meta-learning synthesis method adopts a single text encoder with a parameter generator conditioned on language embeddings and a single decoder to predict mel-spectrograms for all languages. In contrast, our proposed method designs a two-stream model structure that contains two encoders and two decoders for pronunciation and prosody modeling, r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 17 publications
0
0
0
Order By: Relevance