10th ISCA Workshop on Speech Synthesis (SSW 10) 2019
DOI: 10.21437/ssw.2019-8
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Speaker Modeling for DNN-based Speech Synthesis Incorporating Generative Adversarial Networks

Abstract: This paper presents a novel DNN-based speech synthesis method we derived from multi-speaker training data. In general, speaker-dependent modeling techniques based on generative adversarial networks (GANs) improve synthesized speech quality. However, they are inadequate for multi-speaker training because conventional discriminators cannot take into account speaker identity, which degrades anti-spoofing performance in GAN discriminators. We introduce two approaches as means to learn GAN speaker characteristics, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 21 publications
(29 reference statements)
0
1
0
Order By: Relevance
“…Therefore, we have achieved high-quality speech synthesis even with a relatively small amount of speech data of the desired speaker [2]. Additionally, by combining this system with generative adversarial networks that have been found to be effective in image generation and other tasks, we have achieved improvements in the quality of synthesized speech and in the reproducibility of a speaker's voice [3].…”
Section: Dnn Speech-synthesis Technology For Reproducing Diverse Spea...mentioning
confidence: 99%
“…Therefore, we have achieved high-quality speech synthesis even with a relatively small amount of speech data of the desired speaker [2]. Additionally, by combining this system with generative adversarial networks that have been found to be effective in image generation and other tasks, we have achieved improvements in the quality of synthesized speech and in the reproducibility of a speaker's voice [3].…”
Section: Dnn Speech-synthesis Technology For Reproducing Diverse Spea...mentioning
confidence: 99%