Enumerating differences between various communicative functions for purposes of Czech expressive speech synthesis in limited domain

Matoušek

2013

Speech and Computer

Abstract. In our recent work, a method on how to enumerate differences between various expressive categories (communicative functions) has been proposed. To improve the overall impact of this approach to both the quality of synthetic expressive speech and expressivity perception by listeners, a few modifications are suggested in this paper. The main ones consist in a different way of expressive data processing and penalty matrix calculation. A complex evaluation using listening tests and some auxiliary measures was performed.

Improvements in Czech Expressive Speech Synthesis in Limited Domain

Matoušek

2013

Speech and Computer

Robust Methodology for TTS Enhancement Evaluation

Tihelka

Text, Speech, and Dialogue

Hanzlíček

2013

Abstract. The paper points to problematic and usually neglected aspects of using listening tests for TTS evaluation. It shows that simple random selection of phrases to be listened to may not cover those cases which are relevant to the evaluated TTS system. Also, it shows that a reliable phrase set cannot be chosen without a deeper knowledge of the distribution of differences in synthetic speech, which are obtained by comparing the output generated by an evaluated TTS system to what stands as a baseline system. Having such knowledge, the method able to evaluate the reliability of listening tests, as related to the estimation of possible invalidity of listening results-derived conclusion, is proposed here and demonstrated on real examples.

Dialogue act based expressive speech synthesis in limited domain for the Czech language

Matoušek

Hanzlíček

et al. 2020

IJCAI

This paper deals with expressive speech synthesis in a dialogue. Dialogue acts -discrete expressive categories -are used for expressivity description. The aim of the work is to create a procedure for development of expressive speech synthesis for a dialogue system in a limited domain. The domain is here limited to dialogues between a human and a computer on a given topic of reminiscing about personal photographs. To incorporate expressivity into synthetic speech, modifications of current algorithms used for neutral speech synthesis are made. An expressive speech corpus is recorded, annotated using a predefined set of dialogue acts, and its acoustic analysis is performed. Unit selection and HMM-based methods are used to synthesize expressive speech, and an evaluation using listening tests is presented. The listeners asses two basic aspects of synthetic expressive speech for isolated utterances: speech quality and expressivity perception. The evaluation is also performed for utterances in a dialogue to asses appropriateness of synthetic expressive speech. It can be concluded that synthetic expressive speech is rated positively even though it is of worse quality when comparing with the neutral speech synthesis. However, synthetic expressive speech is able to transmit expressivity to listeners and to improve the naturalness of the synthetic speech.Povzetek: Razvita je metoda za izrazno govorno sintezo včeščini.