Tone information is very important to speech recognition in a tonal language such as Thai. In this article, we present a method for isolated Thai tone recognition. First, we define three sets of tone features to capture the characteristics of Thai tones and employ a feedforward neural network to classify tones based on these features. Next, we describe several experiments using the proposed features. The experiments are designed to study the effect of initial consonants, vowels, and final consonants on tone recognition. We find that there are some correlations between tones and other phonemes, and the recognition performances are satisfying. A human perception test is then conducted to judge the recognition rate. The recognition rate of a human is much lower than that of a machine. Finally, we explore various combination schemes to enhance the recognition rate. Further improvements are found in most experiments.
The Support Vector Machine (SVM) has recently been introduced as a new pattern classification technique. It learns the boundary regions between samples belonging to two classes by mapping the input samples into a high dimensional space, and seeking a separating hyperplane in this space. This paper describes an application of SVMs to two phoneme recognition problems: 5 Thai tones, and 12 Thai vowels spoken in isolation. The best results on tone recognition are 96.09% and 90.57% for the inside test and outside test, respectively, and on vowel recognition are 95.51% and 87.08% for the inside test and outside test, respectively.
Dysarthric speech recognition (DSR) is continuously developed to improve the quality of life of people with speech impairment. This study aimed to investigate the effect of pauses in DSR. Speech corpus consists of 40 words including two subsets, (i) 20 bisyllabic words with specific design in order to contain all types of final consonant-initial consonant junction in Thai language and (ii) 20 monosyllabic words, which have some phoneme similar to that of the previous subset. Four cerebral palsy children with dysarthria and two normal children were participated. DSR was trained by using Hidden Markov Models (HMMs) in 3 approaches: phoneme-based (PSR), word-based (WSR), and pause reducing word-based (PRWSR). For the third approach, the pauses in words were automatically detected and reduced. The accuracy for PRWSR was compared with that of WSR by varying the duration of remaining pauses in PRWSR. Speech samples from the normal children were also recognized for comparing the accuracy. The results showed that PSR provided the highest recognition rate. The recognition rates of WSR and PRWSR are not significantly different but PRWSR grants a bit higher recognition rate than WSR. Comparing the remaining pause duration, 100 ms remaining pause duration is better than any other duration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.