2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472738
|View full text |Cite
|
Sign up to set email alerts
|

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages

Abstract: Bidirectional long short-term memory (BLSTM) based speech synthesis has shown great potential in improving the quality of the synthetic speech. However, for low-resource languages, it is difficult to obtain a high quality BLSTM model. BLSTM based speech synthesis can be viewed as a transformation between the input features and the output features. We assume that the input and output layers of BLSTM are language-dependent while the hidden layers can be language-independent if trained properly. We investigate wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
22
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 30 publications
(22 citation statements)
references
References 12 publications
0
22
0
Order By: Relevance
“…Previous research on multi-lingual multi-speaker (MLMS) statistical parametric speech synthesis (SPSS) has discussed using high-resource languages to help construct TTS systems for low-resource languages. Some research shows that the model trained on multiple languages can benefit from cross-lingual information and aid the adaptation to new languages using only a small amount of data [13,14]. In their methods, linguistic inputs of each language are converted internally into language-independent representations.…”
Section: Introductionmentioning
confidence: 99%
“…Previous research on multi-lingual multi-speaker (MLMS) statistical parametric speech synthesis (SPSS) has discussed using high-resource languages to help construct TTS systems for low-resource languages. Some research shows that the model trained on multiple languages can benefit from cross-lingual information and aid the adaptation to new languages using only a small amount of data [13,14]. In their methods, linguistic inputs of each language are converted internally into language-independent representations.…”
Section: Introductionmentioning
confidence: 99%
“…The acoustic model is trained on multiple languages and may never observe the target language in its training data. This type of acoustic models, the multilingual multi-speaker (MLMS) models, were proposed in [12,13,14]. These approaches utilize a large input feature space consisting of "concatenated" language-dependent components, one for each language.…”
Section: Introductionmentioning
confidence: 99%
“…BLSTM-RNN is an extended architecture of bidirectional recurrent neural network (BRNN) [49]. It replaces units in the hidden layers of BRNN with LSTM memory blocks.…”
Section: Blstmmentioning
confidence: 99%
“…The notations of these equations are explained in [49] and φ(•) is the activation function which can be implemented by the LSTM block with equations in [49].…”
Section: Blstmmentioning
confidence: 99%