Speech Prosody 2016 2016
DOI: 10.21437/speechprosody.2016-209
|View full text |Cite
|
Sign up to set email alerts
|

Using hierarchical information structure for prosody prediction in content-to-speech applications

Abstract: State-of-the-art prosody modelling in content-to-speech (CTS) applications still uses the same methodology to predict intonation cues as text-to-speech (TTS) applications, namely the analysis of the generated surface sentences with respect to part of speech, syntactic dependency relations and word order. On the other side, several theoretical studies argue that morphology, syntax, and information (or communicative) structure that organizes a given content (semantic or deep-syntactic structure) with respect to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
4
1
1

Relationship

5
1

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…Moreover, the prediction of expressions from text and the synthesis of a particular expression have been integrated together [14]. In addition, prosody has been structured as a multi-level hierarchy for emotional speech synthesis [5], and its correlation with both hierarchical information structure and discourse has also been analyzed for speech synthesis purposes [15,16,17]. However, the general trend is to work on sentence level.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, the prediction of expressions from text and the synthesis of a particular expression have been integrated together [14]. In addition, prosody has been structured as a multi-level hierarchy for emotional speech synthesis [5], and its correlation with both hierarchical information structure and discourse has also been analyzed for speech synthesis purposes [15,16,17]. However, the general trend is to work on sentence level.…”
Section: Related Workmentioning
confidence: 99%
“…A total of five partitions are identified, including three spans at level 1 (L1), a specifier (SP1), theme (T1) and rheme (R1), and two embedded spans at level 2 (L2) 2 in the rheme, a theme (T1(R1)) and a rheme (R1(R1)). 3 ( A hierarchical thematicity structure of this kind has been shown to correlate better with ToBI labels than binary flat thematicity [9]. Such a correlation still does not solve the problem of a one-to-one mapping between a specific intonation label (e.g., H*) to a static acoustic parameter (e.g., an increase of 50% in fundamental frequency).…”
Section: Motivation and Backgroundmentioning
confidence: 99%
“…In the early 2000ies, there were some attempts to introduce some basic concepts of Information Structure in TTS applications, in particular, thematicity, understood as the partition of a sentence into theme (i.e., what the sentence in about) and rheme (i.e., what is said about the theme); see [6,7] among others. However, a binary flat representation of thematicity of this kind has been proved to be insufficient to describe long complex sentences, whereas the hierarchical tripartite approach proposed in [8] within the Meaning-Text Theory yields a better correspondence to prosodic patterns as shown in [9,10].…”
Section: Introductionmentioning
confidence: 99%
“…The corpus has been processed using Bohnet's [12] joint tagger and dependency parser to obtain lexical and syntactic features and annotated manually with information structure (more precisely, with Thematicity 2 features from Mel'čuk's [9] communicative structure), following the guidelines established by Bohnet et al [13]. Further details on how information structure is understood can be found in the authors' work [6] and [14] on its correlation to prosody.…”
Section: The Datasetmentioning
confidence: 99%