9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-3
|View full text |Cite
|
Sign up to set email alerts
|

Emphasis recreation for TTS using intonation atoms

Abstract: We are interested in emphasis for text to speech synthesis. In speech to speech translation, emphasising the correct words is important to convey the underlying meaning of a message. In this paper, we propose to use a generalised command-response (CR) model of intonation to generate emphasis in synthetic speech. We first analyse the differences in the model parameters between emphasised words in an acted emphasis scenario and their neutral counterpart. We investigate word level intonation modelling using simpl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 17 publications
0
3
0
Order By: Relevance
“…A phrase additionally consists of a phrase atom (phrase command in CR model) which models the general shape of the contour and is correlated mainly to the physics of the speakers' lung volume (dotted line in upper plot of Figure 1). Experiments have shown that the proposed model is capable of producing good representations and can transplant emphasis from one language to another [12].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…A phrase additionally consists of a phrase atom (phrase command in CR model) which models the general shape of the contour and is correlated mainly to the physics of the speakers' lung volume (dotted line in upper plot of Figure 1). Experiments have shown that the proposed model is capable of producing good representations and can transplant emphasis from one language to another [12].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Recently, however, this gap has been closed to some extent by studies by Delić et al (2016), Szaszák et al (2016) and , who have shown that GCR atoms correlate rather well with ToBI markers, and can help with detection of emphasis (or stress). Honnet and Garner (2016) have also shown that GCR can be used to synthesise emphasis. Aside from being reassuring, this is intuitive in that ToBI is a mechanism for constructing linguistic cues; it is a semantic level below that of the cues themselves.…”
Section: Linguistic Meaningmentioning
confidence: 99%
“…Some researchers have tried to apply ideas of classical intonation models by predicting their meanings (Chakrasali et al, 2022;Kuczmarski, 2021;Marelli et al, 2019) or identifying intonation segments (Alvarez et al, 2022). Other intonation modelling instances include the transfer of intonation components into neutral synthesized speech (Honnet and Garner, 2016), unsupervised or supervised training of latent prosody space (Sun et al, 2020;Raitio et al, 2020Raitio et al, , 2022. In Birkholz and Zhang (2020), microprosody-based intonation modelling is presented as an additional way of improving the naturalness of synthesized speech.…”
Section: Introductionmentioning
confidence: 99%