Modeling of Fundamental Frequency Contour of Thai Expressive Speech using Fujisaki's Model and Structural Model

Problem statement:The studies on Thai expressive speech or emotional speech have been conducted for years. Most of them are expected to analysis the characteristics of Thai expressive speech. However, the conclusive reviews on these studies have not been conducted for further study on the speech technology or application of Thai expressive speech. Approach: The review of research on Thai expressive speech in various aspects has been performed. They include an analysis of fundamental frequency contours using Fujisaki's model, an analysis of fundamental frequency contours using structural model and speech compression with noisy environments. It has been noted that four speaking emotions include enjoyable, sad, angry and reading styles. Results: A comparison of two successful F 0 models has been reviewed. One approach is based on the Fujisaki's model which has been applied for many tonal and toneless languages. Another one is based on the structural model which has been conducted primarily for Mandarin Chinese. Moreover, a study of speech compression for noise-corrupted Thai expressive speech by using two coding methods of CS-ACELP and MP-CELP has been summarized. Conclusion: From the study, it can be seen that two mathematical models have been successfully applied to model the fundamental frequency contour of Thai expressive speech. As for speech compression, it can be seen that coding methods, types of noise, levels of noise, speech gender influence on the coding speech quality.

Section: Resultsmentioning

confidence: 99%

Thai Expressive Speech Processing Technology: A Review

Chomphan¹

2012

“…2, the following charts are summarized (Chomphan, 2011b). First, the noise effects on the male-angry-style speech are summarized in terms of RMSE values with four different types of noises and five different levels of noises in Fig.…”

Section: Resultsmentioning

confidence: 99%

Effects of Environmental Noises on Fundamental Frequency Contours of Thai Expressive Speech

N.¹

2012

Problem statement:The expressive speech of Thai had been studied for a short period of time. An important feature of speech was fundamental frequency (F0) which defines the human speech prosody. It could be used to distinguish the difference between several types of expressive speech. The environmental noises affect the F0 contour for Thai dialects as concluded in the previous study. The study prosodic information of Thai speech with various speaking styles and several types of noises had not been conducted. Approach: Four different types of speaking styles were used; meanwhile four types of environmental noises were recorded with different levels of power. They were subsequently mixed together. The F0 contours from different types of speaking styles, different types of noises and different levels of noises were extracted. The Root Mean Square Error (RMSE) between the F0 contour of clean speech and the noise-corrupted speech was calculated. Results: In the experiments, four types of noises were included train, factory, car and air conditioner. Each type of speaking style included 10 samples of 10 utterances of male and female speech. Five levels of noises were varied from 0-20 dB compared with the clean speech. It could be notified that the effects of distinguishing types of noises were different. Four different types of speaking styles were also caused the differences in RMSEs. Conclusion: The recorded noises deteriorate the F0 contours for all types of speaking styles in Thai.

“…They are baseline frequency, number of phrase commands, number of tone commands, phrase command duration, tone command duration, amplitude of phrase command and amplitude of tone command. The derived output parameters are mostly extracted for Thai tones (Chomphan, 2011).…”

Section: Fujisaki's Modelmentioning

confidence: 99%

Analytical Study of Fujisakiâs Model of Fundamental Frequency Contour for Thai Tones

S.¹

2012

Problem statement: Tone of a tonal language is an important feature of a prosodic syllable to identify the meanings of that syllable or that part of word. Ii is very crucial to model the feature related to tone of speech to achieve the most naturalness in speech communication. Approach: The study presents an approach to analyze the model parameters of Thai tones for two genders. The successive modeling of fundamental frequency, Fujisakiâs model is selected. We derive seven parameters; baseline frequency, the numbers of phrase commands and tone commands, phrase command and tone command durations, amplitudes of phrase command and tone command. Results: In the experimental results, there are 20 syllables and each syllable includes 50 samples of a tone with male and female speech. Five tones are recorded in the same environment. Thereafter, there are ten thousands samples in the speech corpus. It can be obviously seen that Thai tones are determined by the derived parameters. Conclusion: All in all, Thai tones are able to be discriminated by the derived parameter of Fujisakiâs model.