10th ISCA Workshop on Speech Synthesis (SSW 10) 2019
DOI: 10.21437/ssw.2019-35
|View full text |Cite
|
Sign up to set email alerts
|

Building Multilingual End-to-End Speech Synthesisers for Indian Languages

Abstract: Building text-to-speech (TTS) synthesisers is a difficult task, especially for low resource languages. Language-specific modules need to be developed for system building. End-to-end speech synthesis has become a popular paradigm as a TTS can be trained using only pairs. However, end-to-end speech synthesis is not scalable in a multilanguage scenario, as the vocabulary increases with the number of different scripts. In this paper, TTSes are trained for Indian languages using two text representatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(19 citation statements)
references
References 16 publications
0
19
0
Order By: Relevance
“…We obtained MLME values from the following studies: [6], [12], [13], [14], [16], [17], [18], [20], [22], [25], [26], and [27], and reported them in Table 3, both as a whole and in specific groups of evaluation metrics, in the form of median (M) and interquartile range (IQR). Also reported are the p-values of the corresponding one-sample Wilcoxon signed rank tests for the hypothesis that the median MLME values are larger than 0.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…We obtained MLME values from the following studies: [6], [12], [13], [14], [16], [17], [18], [20], [22], [25], [26], and [27], and reported them in Table 3, both as a whole and in specific groups of evaluation metrics, in the form of median (M) and interquartile range (IQR). Also reported are the p-values of the corresponding one-sample Wilcoxon signed rank tests for the hypothesis that the median MLME values are larger than 0.…”
Section: Resultsmentioning
confidence: 99%
“…These resulting values (n = 880) were used for analysis. [6], [7], [8], [9], [10], [11], [12] Hidden Markov Model synthesis (HMM) 7 [12], [13], [14], [15], [16], [17], [18] Neural network (non-S2S) synthesis (DNN) 9 [19], [20], [21], [22], [23], [24], [25], [26], [27] Sequence-to-sequence synthesis (S2S)…”
Section: Characteristics Of the Included Studiesmentioning
confidence: 99%
See 2 more Smart Citations
“…The parser leveraged the phonetic similarity among the Indian languages to generate the lexicon. Further [4] explored ways of merging tokens, based either on characters or on phones. Based on subjective evaluations, both approaches gave reasonable quality speech synthesis.…”
Section: Introductionmentioning
confidence: 99%