Computer Graphics and Imaging / 798: Signal Processing, Pattern Recognition and Applications 2013
DOI: 10.2316/p.2013.798-069
|View full text |Cite
|
Sign up to set email alerts
|

Structural KLD for Cross-Variety Speaker Adaptation in HMM-based Speech Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 15 publications
0
3
0
Order By: Relevance
“…A well-known method for data augmentation is speaker adaptation, where the most common approach is to build an average voice model of multiple speakers and then adapt a model for new (target) speaker from it. Speaker adaptation is a wellresearched topic in HMM-based speech synthesis [4,5,6,7,8,9] but still relatively unexplored for DNN-based synthesis. Arik et al [10] found that speaker adaptation by fine-tuning (i.e.…”
Section: Data Augmentationmentioning
confidence: 99%
“…A well-known method for data augmentation is speaker adaptation, where the most common approach is to build an average voice model of multiple speakers and then adapt a model for new (target) speaker from it. Speaker adaptation is a wellresearched topic in HMM-based speech synthesis [4,5,6,7,8,9] but still relatively unexplored for DNN-based synthesis. Arik et al [10] found that speaker adaptation by fine-tuning (i.e.…”
Section: Data Augmentationmentioning
confidence: 99%
“…Typical applications of cross-lingual technologies include adapting speaker models between different dialects, accents, or variants of languages [12,13]. We believe that we can consider different expressiveness or speaking styles in the same fashion, and use our proposed transplantation technique combined with cross-language technologies with the purpose of transplanting paralinguistic features between languages.…”
Section: Cross-lingual Emotion Transplantationmentioning
confidence: 99%
“…As soon as the dialect/sociolect of the user is detected, we can use interpolation to create a dialog system persona that fits the dialect/sociolect spoken by the user. In Toman et al (2013a), we presented a method for cross-variety speaker transformation based on HSMM state mapping (Wu et al, 2009). Transforming the voice of a speaker from one variety to another can be used as a basis for dialect interpolation.…”
Section: Introductionmentioning
confidence: 99%