Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1511
|View full text |Cite
|
Sign up to set email alerts
|

EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System

Abstract: We present EMPHASIS, an emotional phoneme-based acoustic model for speech synthesis system. EMPHASIS includes a phoneme duration prediction model and an acoustic parameter prediction model. It uses a CBHG-based regression network to model the dependencies between linguistic features and acoustic features. We modify the input and output layer structures of the network to improve the performance. For the linguistic features, we apply a feature grouping strategy to enhance emotional and prosodic features. The aco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…Later, as the development of statistics machine learning, statistical parametric speech synthesis (SPSS) is proposed [409,350,418,351], which predicts parameters such as spectrum, fundamental frequency and duration for speech synthesis. From 2010s, neural network-based speech synthesis [419,278,76,417,369,187,248,376] has gradually become the dominant methods and achieved much better voice quality.…”
Section: History Of Tts Technologymentioning
confidence: 99%
“…Later, as the development of statistics machine learning, statistical parametric speech synthesis (SPSS) is proposed [409,350,418,351], which predicts parameters such as spectrum, fundamental frequency and duration for speech synthesis. From 2010s, neural network-based speech synthesis [419,278,76,417,369,187,248,376] has gradually become the dominant methods and achieved much better voice quality.…”
Section: History Of Tts Technologymentioning
confidence: 99%
“…In recent years, emotional speech synthesis based on deep learning has become an emergent area of interest. LSTM-based emotional statistical parametric speech synthesis method is proposed by [2], and [25] further tried to synthesize questionable and exclamatory speech rather than specific emotional categories with a real time synthesis system. With the rapid development of deep learning, convolutional neural networks been proven effective in the field of audio processing.…”
Section: Related Workmentioning
confidence: 99%
“…synthesis Li et al, 2018), to neural network based parametric synthesis (Arik et al, 2017), and to currently end-to-end neural models. The end-to-end models directly map input text or phonetic characters to output speech, which greatly simplifies the training pipeline and reduces the requirements for linguistic and acoustic knowledge.…”
Section: Introductionmentioning
confidence: 99%