ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683815
|View full text |Cite
|
Sign up to set email alerts
|

An End-to-end Network to Synthesize Intonation Using a Generalized Command Response Model

Abstract: The generalized command response (GCR) model represents intonation as a superposition of muscle responses to spike command signals. We have previously shown that the spikes can be predicted by a two-stage system, consisting of a recurrent neural network and a post-processing procedure, but the responses themselves were fixed dictionary atoms. We propose an end-to-end neural architecture that replaces the dictionary atoms with trainable second-order recurrent elements analogous to recursive filters. We demonstr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 17 publications
0
4
0
Order By: Relevance
“…However, question intonation across sentences and speakers is very far from homogeneous making learning such forms from a corpus a challenging task. Research in this area is still at an early stage, with the exploration of different neural net models and feature architectures to improve the variation expected in spoken intonation a source of current research [Kenter et al 2019, Marelli et al 2019, Sun et al 2020.…”
Section: Emphasis and Question Intonationmentioning
confidence: 99%
“…However, question intonation across sentences and speakers is very far from homogeneous making learning such forms from a corpus a challenging task. Research in this area is still at an early stage, with the exploration of different neural net models and feature architectures to improve the variation expected in spoken intonation a source of current research [Kenter et al 2019, Marelli et al 2019, Sun et al 2020.…”
Section: Emphasis and Question Intonationmentioning
confidence: 99%
“…The paper [ 36 ] offers a suggestion for a demonstration to enhance the synthesized speech’s prosody of mathematical markup language (MathML) based mathematical expressions, while [ 37 ] demonstrates that an end-to-end neural network with integrated second-order adaptable linear all-pole digital filters can produce intonation with a natural sound, provided that the proper stability conditions are applied. However, while intonational synthesis speech has come a long way in recent years, it still faces challenges in replicating the full range of natural intonation patterns and nuances of human speech.…”
Section: Related Workmentioning
confidence: 99%
“…To this end, we use a DNN-based state-of-the-art Merlin TTS system in conjunction with the Festival front-end, two Bidirectional Long Short-Term Memory networks as duration and acoustic models, and the WORLD vocoder. For details on the TTS systems and the training procedure, the reader is referred to [23,24]. By training a TTS system for each speaker, we get 4 speaker-dependent TTS systems.…”
Section: Algorithmic Settings Evaluation and State-of-the-art Measuresmentioning
confidence: 99%