Paper presents research results obtained when building a speaker independent hybrid speech recognizer. This recognizer will be integrated as a phrase recognizer in a medical-pharmaceutical information system. The hybrid speech recognizer consists of two recognition components: an adapted commercial Microsoft Spanish speech recognizer and a locally developed hidden Markov models based recognizer implementing Lithuanian acoustic models. Efficiency of both recognition components was evaluated on multiple speaker independent speech recognition tasks. The average accuracy of Lithuanian recognizer was higher reaching 0.6% phrase error rate for user requests in medical-pharmaceutical domain. The adapted commercial Spanish speech recognizer showed the ability to improve the accuracy of Lithuanian recognizer in the worst recognition scenarios. These results proved the hypothesis formulated when proposing the basic idea of hybrid recognition approach: recognition errors from different recognizers built using various techniques are not strongly correlated. This fact could be exploited for improved overall speech recognition accuracy.
Abstract.Computerized systems with voice user interfaces could save time and ease the work of healthcare practitioners. To achieve this goal voice user interface should be reliable (to recognize the commands with high enough accuracy) and properly designed (to be convenient for the user). The paper deals with hybrid approach implementation issues for the voice commands recognition. By the hybrid approach we assume the combination of several different recognition methods to achieve higher recognition accuracy. The experimental results show that most voice commands are recognized good enough but there is some set of voice commands which recognition is more complicated. In this paper the novel method is proposed for the combination of several recognition methods based on the Ripper algorithm. Experimental evaluation showed that this method allows achieve higher recognition accuracy than application of blind combination rule.
This paper presents the corpus-driven approach in building the computational model of fundamental frequency, or F 0 , for Lithuanian language. The model was obtained by training the HMM-based speech synthesis system HTS on six hours of speech coming from multiple speakers. Several gender specific models, using different parameters and different contextual factors, were investigated. The models were evaluated by synthesizing F 0 contours and by comparing them to the original F 0 contours using criteria of root mean square error (RMSE) and voicing classification error. The HMM-based models showed an improvement of the RMSE over the mean-based model that predicted F 0 of the vowel on the basis of its average normalized pitch.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.