Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-37
|View full text |Cite
|
Sign up to set email alerts
|

Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages

Abstract: Acquiring data for text-to-speech (TTS) systems is expensive. This typically requires large amounts of training data, which is not available for low-resourced languages. Sometimes small amounts of data can be collected, while often no data may be available at all. This paper presents an acoustic modeling approach utilizing long short-term memory (LSTM) recurrent neural networks (RNN) aimed at partially addressing the language data scarcity problem. Unlike speaker-adaptation systems that aim to preserve speaker… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
24
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(25 citation statements)
references
References 26 publications
1
24
0
Order By: Relevance
“…Previous research on multi-lingual multi-speaker (MLMS) statistical parametric speech synthesis (SPSS) has discussed using high-resource languages to help construct TTS systems for low-resource languages. Some research shows that the model trained on multiple languages can benefit from cross-lingual information and aid the adaptation to new languages using only a small amount of data [13,14]. In their methods, linguistic inputs of each language are converted internally into language-independent representations.…”
Section: Introductionmentioning
confidence: 99%
“…Previous research on multi-lingual multi-speaker (MLMS) statistical parametric speech synthesis (SPSS) has discussed using high-resource languages to help construct TTS systems for low-resource languages. Some research shows that the model trained on multiple languages can benefit from cross-lingual information and aid the adaptation to new languages using only a small amount of data [13,14]. In their methods, linguistic inputs of each language are converted internally into language-independent representations.…”
Section: Introductionmentioning
confidence: 99%
“…These resulting values (n = 880) were used for analysis. [6], [7], [8], [9], [10], [11], [12] Hidden Markov Model synthesis (HMM) 7 [12], [13], [14], [15], [16], [17], [18] Neural network (non-S2S) synthesis (DNN) 9 [19], [20], [21], [22], [23], [24], [25], [26], [27] Sequence-to-sequence synthesis (S2S)…”
Section: Characteristics Of the Included Studiesmentioning
confidence: 99%
“…Intelligibility [23] WER (Word Error Rate) Intelligibility [20] MOS (Mean Opinion Score) Naturalness/ Quality [6], [8], [9], [14], [15], [16], [17], [18], [21], [23] A/B Preference (preference rate b/w test & control) Quality [5], [10], [11], [12], [13]…”
Section: Multilingual Model Effect (Mlme)mentioning
confidence: 99%
“…Previous work in exploring factorized multilingual, multispeaker neural models has been proposed in [12] and [14]. The model in [12] which in turn builds on top of the model proposed in [13], where data is factorized across speakers by having a separate speaker partitioned layer.…”
Section: Previous Approachesmentioning
confidence: 99%
“…Thus, it allows us to explore various tying and weight sharing schemes between similar languages in case of lesser data, such as tying Assamese, Nepali and Bengali, while also allowing the use of a single model across all languages. Similarly the model in [14] also builds on this same multi-lingual, multi-speaker (MLMS) model for low-resource languages. However, both of these models still rely on frame-based linguistic features.…”
Section: Previous Approachesmentioning
confidence: 99%