Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-345
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data

Abstract: Cross-lingual speaker adaptation with limited adaptation data has many applications such as use in speech-to-speech translation systems. Here, we focus on cross-lingual adaptation for statistical speech synthesis (SSS) systems using limited adaptation data. To that end, we propose two techniques exploiting a bilingual Turkish-English speech database that we collected. In one approach, speaker-specific state-mapping is proposed for cross-lingual adaptation which performed significantly better than the baseline … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 15 publications
0
6
0
Order By: Relevance
“…There have been previous attempts to train generic voice models from different perspectives-polyglot synthesis [17]- [28], code-mixing 2 [23], [26], [29]- [36], cross-lingual voice conversion [37], [38], and data augmentation [21], [22], [25], [26], [28], [39]- [44]. Polyglot synthesis aims to synthesise texts of multiple languages in the voice of a single speaker.…”
Section: Related Workmentioning
confidence: 99%
“…There have been previous attempts to train generic voice models from different perspectives-polyglot synthesis [17]- [28], code-mixing 2 [23], [26], [29]- [36], cross-lingual voice conversion [37], [38], and data augmentation [21], [22], [25], [26], [28], [39]- [44]. Polyglot synthesis aims to synthesise texts of multiple languages in the voice of a single speaker.…”
Section: Related Workmentioning
confidence: 99%
“…These resulting values (n = 880) were used for analysis. [6], [7], [8], [9], [10], [11], [12] Hidden Markov Model synthesis (HMM) 7 [12], [13], [14], [15], [16], [17], [18] Neural network (non-S2S) synthesis (DNN) 9 [19], [20], [21], [22], [23], [24], [25], [26], [27] Sequence-to-sequence synthesis (S2S)…”
Section: Characteristics Of the Included Studiesmentioning
confidence: 99%
“…As such a measure did not yet exist, we created one. This measure, hereafter referred to as the MultiLingual Model Effect (MLME), was derived from the reported results as follows: [18], [21], [27] F0 RMSE (Root-Mean-Square Error) Acoustics [12], [18] V/UV (Voiced/UnVoiced error) Acoustics [12], [18] BAP (Band APeriodicities distortion) Acoustics [18] DTWMCD (Dynamic Time Warping Mel-Cepstral Distortion) Acoustics [19] L2 NSE (L2 Norm-Squared on log-Mel spectrogram) Acoustics [24] LF0 RMSE (Log F0 Root-Mean-Square Error) Acoustics [11] LSD (normalized Log Cepstral Distance) Acoustics [12] MGC RMSE (Mel-Generalized Cepstrum coefficients Root-Mean-Square Error) Acoustics [11] MSE (Mean-Square Error) Acoustics [13] CER (Character Error Rate) Intelligibility [21], [26] Intelligibility % (percentage of intelligible sentences) Intelligibility [7] SUS-Wacc (Word accuracy in Semantically Unpredictable Sentences)…”
Section: Multilingual Model Effect (Mlme)mentioning
confidence: 99%
See 2 more Smart Citations