Problem statement: In order to build an utterance training system for Indonesian language, a speech recognition system designed for Indonesian is necessary. However, the system hardly works well due to the pronunciation variants of non-native utterances may lead to substitution/deletion error. This research investigated the pronunciation variant and proposes acoustic model adaptation to improve performance of the system. Approach: The proposed acoustic model adaptation worked in three steps: to analyze pronunciation variant with knowledge-based and data-derived methods; to align knowledge-based and data-derived results in order to list frequently mispronounced phones with their variants; to perform a state-clustering procedure with the list obtained from the second step. Further, three Speaker Adaptation (SA) techniques were used in combination with the acoustic model adaptation and they are compared each other. In order to evaluate and tune the adaptation techniques, perceptual-based evaluation by three human raters is performed to obtain the "true"recognition results. Results: The proposed method achieved an average gain in Hit + Rejection (the percentage of correctly accepted and correctly rejected utterances by the system as the human raters do) of 2.9 points and 2 points for native and non-native subjects, respectively, when compared with the system without adaptation. Average gains of 12.7 and 6.2 points for native and non-native students in Hit + Rejection were obtained by combining SA to the acoustic model adaptation. Conclusion/Recommendations: Performance evaluation of the adapted system demonstrated that the proposed acoustic model adaptation can improve Hit even though there is a slight increase of False Alarm (FA, the percentage of incorrectly accepted utterances by the system of which the human raters reject). The performance of the proposed acoustic model adaptation depends strongly on the effectiveness of state-clustering procedure to recover only in-vocabulary words. For future research, a confidence measure to discriminate between in-vocabulary and out-vocabulary words will be investigated
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.