Theoretical studies on the information structure-prosody interface argue that the content packaged in terms of theme and rheme correlates with the intonation of the corresponding sentence as regards to rising and falling patterns (L*+H LH% and H* LL% respectively). When such a correspondence is used to derive prosody in text-to-speech applications, it is often the case that ToBI labels are statically mapped to acoustic parameters. Such an approach is insufficient to solve the problem of monotonous synthetic voices for two reasons: it is repetitive with respect to prosody enrichment, and a binary flat themerheme representation does not serve to describe properly long complex sentences. In this paper, we introduce a methodology for a more versatile thematicity-based prosody enrichment based on: (i) a hierarchical tripartite thematicity model as proposed in the Meaning-Text Theory, and (ii) a corpus-based approach for the automatic extraction of acoustic parameters (fundamental frequency, breaks and speech rate) that are mapped to a varied range of prosody control tags of the synthesized speech. Such a prosody enrichment has shown to provide higher results in a perception test when implemented in a TTS system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.