Abstract. Spelling recognition is an approach to enhance a speech recognizer to cope with incorrectly recognized words and out-of-vocabulary words. This paper presents a general framework for Thai speech recognition, enhanced with spelling recognition. To implement Thai spelling recognition, Thai alphabets and their spelling methods are analyzed. Based on hidden Markov models, we propose a method to construct a Thai spelling recognition system using an existing continuous speech corpus. To compensate for speed differences between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Our system achieves up to 87.37% correctness and 87.18% accuracy with the mix-type language model.
Abstract. Spelling recognition has been used for several purposes, such as enhancing speech recognition systems and implementing name retrieval systems. Tone information is an important clue, in addition to phones, for recognizing speeches in tonal languages. In this paper, we present a method to improve accuracy of spelling recognition in Thai, a tonal language, by incorporating tonerelated acoustic features to a well-known front-end feature named Perceptual Linear Prediction Coefficients (PLP). The proposed method makes use of three kinds of tone information: fundamental frequency (pitch), pitch delta and pitch acceleration, to enhance the original features. Compared to the baseline result gained from the original feature, our HMMs-based recognition model shows improvement of 1.73%, 2.85% and 3.16% of letter accuracy for close-type, mix-type and open-type language models, respectively.
Spelling recognition is an approach to enhance a speech recognizer's ability to cope with incorrectly recognized words and out-of-vocabulary words. This paper presents a general framework for Thai speech recognition enhanced with spelling recognition. In order to implement Thai spelling recognition, Thai alphabets and their spelling methods are analyzed. Based on hidden Markov models, we propose a method to construct a Thai spelling recognition system by using an existing continuous speech corpus. To compensate the difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Assigning different numbers of states for syllables with different durations is helpful to improve the recognition accuracy. Our system achieves up to 79.38% accuracy.
Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents an approach to construct three recognizers for the three commonlyused Thai spelling methods based on hidden Markov models (HMMs). The Thai phonetic characteristics, alphabet system and spelling methods are analyzed. For the first spelling method, two recognizers, each trained from a small spelling corpus and an existing large continuous speech corpus, are explored. To solve utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Two alternative language models, bigram and trigram, are investigated to evaluate the performance of spelling speech recognition under three different environments: closetype, open-type and mix-type language models. For the first spelling method, our approach achieves up to 93.09% letter correct rate (LCR) and 92.45% letter accuracy (LA) when the language model is trigram under the mix-type environment and the acoustic model is trained from the small spelling corpus. Under the same conditions, we obtained 81.12% LCR and 76.32% LA for the second spelling method and 78.47% LCR and 71.75% LA for the third spelling method. By analyzing the results, it was found that the main source of the errors was letter substitution, which is mostly triggered by the confusion of similar consonant phones and the confusion of short/long vowel pairs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.