For the purpose of constructing a naturalistic emotional speech database, a novel paradigm of collecting naturalistic emotional speech during a spontaneous Japanese dialog was proposed. The proposed paradigm was assessed by investigating whether the collected speech contains and conveys rich emotions psychologically and acoustically. To encourage speakers to experience and express their natural and vivid emotions, a Massively Multiplayer Online Role-Playing Game (MMORPG) was adopted as a task for speakers. They were asked to play the MMORPG together while discussing strategies to achieve their tasks through a voice chat system. The recording was performed for one hour per speaker. The total recording time was approximately 14 hours. The results of emotional labeling for the collected speech supported the validity of the paradigm showing higher interlabeler agreement than the chance levels. In addition, it was revealed that the paradigm is superior in the quantity of emotional speech to other paradigm by showing a significantly higher rate of labeling instances for our speech material (73%, 2 ð2Þ ¼ 27659:87, p < 0:001) than other speech materials. Finally, an acoustical analysis supported the validity of the paradigm, showing a significant difference between the nonemotional utterances and the emotional utterances (p < 0:05).
This paper describes the command-response model for Fo contour generation originally developed for the common Japanese, and demonstrates its capability of generating F ' contours of various other languages with minor languagespecific modifications. The model is especially useful in multilingual speech synthesis, since the same mechanism can be driven by language-specific patterns of input commands to produce FO contours of utterances of the respective languages.
Accentuation serves to express both the discrete information concerning the accent type of a prosodic word and the continuous information concerning its prominence. This paper examines the latter aspect of accentuation using recorded radio news read by announcers. The amplitude of the accent command was extracted from an F 0 contour and used as an index for the level of accentuation. Statistical analysis of the accent command amplitude confirmed the difference between accented and unaccented types. Further analysis of the relationship between amplitudes of two adjoining accent commands also revealed a marked difference in characteristics of these two types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.