IberSPEECH 2018 2018
DOI: 10.21437/iberspeech.2018-9
|View full text |Cite
|
Sign up to set email alerts
|

Towards expressive prosody generation in TTS for reading aloud applications

Abstract: Conversational interfaces involving text-to-speech (TTS) applications have improved expressiveness and overall naturalness to a reasonable extent in the last decades. Conversational features, such as speech acts, affective states and information structure have been instrumental to derive more expressive prosodic contours. However, synthetic speech is still perceived as monotonous, when a text that lacks those conversational features is read aloud in the interface, i.e. it is fed directly to the TTS application… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…We highlight the relevance of exploring formal representations of thematicity, such as the MTT's and we foresee promising outcomes when used as basis for implementation of communicatively-oriented models in TTS and conversational agent applications. Preliminary experiments have been carried out to implement a thematicity-to-prosody module in English and German (Domínguez et al 2017(Domínguez et al , 2018. In this context, it should be, however, noted that these supervised classification experiments are different from an actual implementation of a prosody module in a TTS application.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…We highlight the relevance of exploring formal representations of thematicity, such as the MTT's and we foresee promising outcomes when used as basis for implementation of communicatively-oriented models in TTS and conversational agent applications. Preliminary experiments have been carried out to implement a thematicity-to-prosody module in English and German (Domínguez et al 2017(Domínguez et al , 2018. In this context, it should be, however, noted that these supervised classification experiments are different from an actual implementation of a prosody module in a TTS application.…”
Section: Discussionmentioning
confidence: 99%
“…For this purpose, we carry out machine learning-based classification experiments on a spoken language corpus, 3 which consists of an extract of 109 isolated sentences from the popular Wall Street Journal (WSJ) corpus (Charniak et al 2000), read aloud by native speakers of English. We opted for a reading-aloud setup because one of our applications is a "reading aloud" agent (Domínguez et al 2018), and deficiencies in expressive prosody in TTS become evident with the syntactically demanding genre of newspaper material. The sentences in our corpus are annotated with their thematicity structure (both MTT's tripartite hierarchical thematicity and the flat binary theme-rheme dichotomy, which constitutes the state of the art in speech technologies and which we use as the reference thematicity structure) and with their prosodic structure (in terms of acoustic parameter-oriented labels automatically derived from three prosodic elements, namely, F0, intensity and rhythm, and in terms of ToBI labels).…”
Section: Introductionmentioning
confidence: 99%