A great challenge for text-to-speech synthesis is to produce expressive speech. The main problem is that it is difficult to synthesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Feature Maps (SOFM) are used for clustering the speech data using voice quality parameters of the glottal source, in order to map out the variety of voice styles in the corpus. Subjective evaluation showed that this clustering method successfully separated the speech data into groups of utterances associated with different voice characteristics. This work can be applied in unitselection synthesis by selecting appropriate data sets to synthesise utterances with specific voice styles. It can also be used in parametric speech synthesis to model different voice styles separately.
This paper introduces the open source muster speech engine (Muse) for speech technology research. The Muse platform abstracts common data types and software as used by speech technology researchers. It is designed to assist researchers in making repeatable experiments that are not hard coded to a specific platform, language, algorithm, or corpus. It contains a script language and a shell where users can interact with various components. The presentation of this paper will be accompanied by a demo at the SLT workshop.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.