2019
DOI: 10.1109/access.2019.2954342
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis

Abstract: This paper proposes a deep learning-based Mandarin-Tibetan cross-lingual speech synthesis to realize both Mandarin speech synthesis and Tibetan speech synthesis under a unique framework. Because Tibetan training corpus is hard to record, we train the acoustic models with a large scale Mandarin multispeaker corpus and a small scale Tibetan one-speaker corpus. The acoustic models are trained with deep neural network (DNN), hybrid long short-term memory (LSTM), and hybrid bi-directional long short-term memory (BL… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 22 publications
(25 reference statements)
0
11
0
Order By: Relevance
“…These resulting values (n = 880) were used for analysis. [6], [7], [8], [9], [10], [11], [12] Hidden Markov Model synthesis (HMM) 7 [12], [13], [14], [15], [16], [17], [18] Neural network (non-S2S) synthesis (DNN) 9 [19], [20], [21], [22], [23], [24], [25], [26], [27] Sequence-to-sequence synthesis (S2S)…”
Section: Characteristics Of the Included Studiesmentioning
confidence: 99%
See 1 more Smart Citation
“…These resulting values (n = 880) were used for analysis. [6], [7], [8], [9], [10], [11], [12] Hidden Markov Model synthesis (HMM) 7 [12], [13], [14], [15], [16], [17], [18] Neural network (non-S2S) synthesis (DNN) 9 [19], [20], [21], [22], [23], [24], [25], [26], [27] Sequence-to-sequence synthesis (S2S)…”
Section: Characteristics Of the Included Studiesmentioning
confidence: 99%
“…From the chosen studies, notable studies include [18] (hereafter Study A), the latest included DNN-based study, which used 5 different evaluation metrics for TTS in Tibetan (with Mandarin as the source language). For S2S-based studies, among the latest are [27] (Study B), which explored TTS for Indic LRLs in the Indo-Aryan and Dravidian families, and [25] (Study C), which investigated strategies for using Dutch and other European languages to aid a limited amount of English data.…”
Section: Notable Studiesmentioning
confidence: 99%
“…5, a structure including two different recurrent networks with the same output is capable of both forward and backward training process [36]. Proposed topology has been widely applied in the speech recognition domain due to its' capability to efficiently recognize a word by using not only the previous words but also the whole sentence [37]. Motivated by the proposed principle, here, the necessary information is completely exploited by the explanatory variables This work is licensed under a Creative Commons Attribution 4.0 License.…”
Section: Blstm Networkmentioning
confidence: 99%
“…e experimental results show that synthesized Tibetan speech is better than the HMM-based Mandarin-Tibetan cross-lingual speech synthesis. e work [20] trains the acoustic models with DNN, hybrid long short-term memory (LSTM), and hybrid bidirectional long short-term memory (BLSTM) and implements a deep learning-based Mandarin-Tibetan crosslingual speech synthesis under a unique framework. Experiments demonstrated that the hybrid BLSTM-based cross-lingual speech synthesis framework was better than the Tibetan monolingual framework.…”
Section: Introductionmentioning
confidence: 99%