2016
DOI: 10.1587/transinf.2016slp0011
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis

Abstract: SUMMARYBuilding high-quality text-to-speech (TTS) systems without expert knowledge of the target language and/or time-consuming manual annotation of speech and text data is an important yet challenging research topic. In this kind of TTS system, it is vital to find representation of the input text that is both effective and easy to acquire. Recently, the continuous representation of raw word inputs, called "word embedding", has been successfully used in various natural language processing tasks. It has also be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 27 publications
0
12
0
Order By: Relevance
“…To name a few, use word embeddings to substitute TOBI and POS tags in RNN-based synthesis achieving significant system improvement. Wang et al (2016) enhance the input to NN-based systems with continuous word embeddings, and also try to substitute the conventional linguistic input by the word embeddings. They do not achieve performance improvement, however, when they use phrase embeddings combined with phonetic context, they do achieve significant improvement in a DNN-based system.…”
Section: Nn-based Expressive Speech Synthesis With Sentiment Embeddingsmentioning
confidence: 99%
See 1 more Smart Citation
“…To name a few, use word embeddings to substitute TOBI and POS tags in RNN-based synthesis achieving significant system improvement. Wang et al (2016) enhance the input to NN-based systems with continuous word embeddings, and also try to substitute the conventional linguistic input by the word embeddings. They do not achieve performance improvement, however, when they use phrase embeddings combined with phonetic context, they do achieve significant improvement in a DNN-based system.…”
Section: Nn-based Expressive Speech Synthesis With Sentiment Embeddingsmentioning
confidence: 99%
“…They do not achieve performance improvement, however, when they use phrase embeddings combined with phonetic context, they do achieve significant improvement in a DNN-based system. Wang et al (2016) enhances word vectors with prosodic information, i.e. updates them, achieving significant improvements.…”
Section: Nn-based Expressive Speech Synthesis With Sentiment Embeddingsmentioning
confidence: 99%
“…To name a few, Wang et al [13] use word embeddings to substitute TOBI and POS tags in RNN-based synthesis achieving significant system improvement. Wang et al [14] enhance the input to NN-based systems with continuous word embeddings, and also try to substitute the conventional linguistic input by the word embeddings. They do not achieve performance improvement, however, when they use phrase embeddings combined with phonetic context, they do achieve significant improvement in a DNN-based system.…”
Section: Introductionmentioning
confidence: 99%
“…They do not achieve performance improvement, however, when they use phrase embeddings combined with phonetic context, they do achieve significant improvement in a DNN-based system. Wang et al [14] enhance word vectors with prosodic information, i.e. update them, achieving significant improvements.…”
Section: Introductionmentioning
confidence: 99%
“…Although word vectors are shown to be effective in various natural language processing tasks [8], implicit linguistic regularity encoded in these vectors may still be insufficient and noisy for the TTS task. Typically, our previous experiments implied that, at least on the utilized speech corpus, word vectors were not significantly better than the automatically derived prosodic symbols for TTS systems with a acoustic model based on either the recurrent neural network (RNN) or the deep feedforward neural network (DNN) [9].…”
Section: Introductionmentioning
confidence: 99%