2012
DOI: 10.1109/tasl.2012.2187195
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
59
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 82 publications
(59 citation statements)
references
References 25 publications
0
59
0
Order By: Relevance
“…Cluster Adaptive Training (CAT) was originally developed for speech recognition to enable rapid speaker adaptation [8]. And in [9] CAT has been extended for statistical parametric synthesis to perform the speaker and language factorization. The CAT model consists of cluster of models and transformation is employed to represent the specific target model.…”
Section: Alternative Statistical Parametric Modelsmentioning
confidence: 99%
“…Cluster Adaptive Training (CAT) was originally developed for speech recognition to enable rapid speaker adaptation [8]. And in [9] CAT has been extended for statistical parametric synthesis to perform the speaker and language factorization. The CAT model consists of cluster of models and transformation is employed to represent the specific target model.…”
Section: Alternative Statistical Parametric Modelsmentioning
confidence: 99%
“…In contrast to concatenative synthesis [15], which stores speech waveforms, the parametric representation in SPSS has several potential advantages, including flexibility in changing voice characteristics [3], speaker and style adaptation [16][17][18][19], easier multilingual support [20][21][22], superior coverage of acoustic space [3], reduced memory footprint [3], and better robustness to lowquality speech recordings [23].…”
Section: Introductionmentioning
confidence: 99%
“…However, working with average voice models is difficult for under-resourced languages since building such general model needs remarkable efforts to design, record, and transcribe a thorough multi-speaker speech database [3]. To alleviate the data sparsity problem in under-resourced languages, speaker and language factorization (SLF) technique can be used [34]. SLF attempts to factorize speaker-specific and language-specific characteristics in training data and then model them using different transforms.…”
Section: Introductionmentioning
confidence: 99%