Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-428
|View full text |Cite
|
Sign up to set email alerts
|

Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 10 publications
0
6
0
Order By: Relevance
“…Innovative uses of Long short-term memory recurrent neural networks (LSTM-RNNs) have been explored to various speech applications as can be seen in Reference [21], where the proposed architecture can synthesise natural sounding speech without requiring utterance-level batch processing. Likewise, an LSTM's performance is enhanced via hierarchical cascading in Reference [22].…”
Section: Related Workmentioning
confidence: 99%
“…Innovative uses of Long short-term memory recurrent neural networks (LSTM-RNNs) have been explored to various speech applications as can be seen in Reference [21], where the proposed architecture can synthesise natural sounding speech without requiring utterance-level batch processing. Likewise, an LSTM's performance is enhanced via hierarchical cascading in Reference [22].…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, there have been various efforts towards speech generation and the most popular ones include HMM-based speech synthesis models [29], specialized incremental dialogue systems [12], and DNN-based systems [34]. State-of-theart approaches for text-to-speech leverage various forms of DNNs; in [20] a system is presented that builds models based on waveforms, in [28] the models are based on spectrograms, while in [23] hierarchical cascading is used to improve an LSTM network. Furthermore, day-to-day voice enabled products like Apple's Siri, Amazon's Alexa/Echo, Microsoft's Cortana and Google's Assistant are joined by more healthcare oriented agents like Aiva 2 , Merit 3 , Suki 4 and Robin 5 .…”
Section: Related Workmentioning
confidence: 99%
“…Hybrid synthesis systems make use of the advantages of both SPSS and unit selection based speech synthesis. The hybrid TTS synthesis system applies the statistical acoustic model to generate either speech parameter trajectories [16,17], or unit embeddings [18][19][20][21] to guide the unit selection process. In this paper, we propose a hybrid TTS synthesis system which uses an improved sequence-to-sequence (textto-feature) acoustic model to generate the speech parameter trajectory and trajectory tiling based waveform concatenation to synthesize the final waveform.…”
Section: Introductionmentioning
confidence: 99%