His research interests include natural language processing, information retrieval, machine learning and its application.
Supervised disambiguation using a large amount of corpus data delivers better performance than other word sense disambiguation methods. However, it is not easy to construct large-scale, sense-tagged corpora since this requires high cost and time. On the other hand, implementing unsupervised disambiguation is relatively easy, although most of the efforts have not been satisfactory. A primary reason for the performance degradation of unsupervised disambiguation is that the semantic occurrence probability of ambiguous words is not available. Hence, a data deficiency problem occurs while determining the dependency between words. This paper proposes an unsupervised disambiguation method using a prior probability estimation based on the Korean WordNet. This performs better than supervised disambiguation. In the Korean WordNet, all the words have similar semantic characteristics to their related words. Thus, it is assumed that the dependency between words is the same as the dependency between their related words. This resolves the data deficiency problem by determining the dependency between words by calculating the χ2 statistic between related words. Moreover, in order to have the same effect as using the semantic occurrence probability as prior probability, which is used in supervised disambiguation, semantically related words of ambiguous vocabulary are obtained and utilized as prior probability data. An experiment was conducted with Korean, English, and Chinese to evaluate the performance of our proposed lexical disambiguation method. We found that our proposed method had better performance than supervised disambiguation methods even though our method is based on unsupervised disambiguation (using a knowledge-based approach).
Speech processing technology has great potential in the medical field to provide beneficial solutions for both patients and doctors. Speech interfaces, represented by speech synthesis and speech recognition, can be used to transcribe medical documents, control medical devices, correct speech and hearing impairments, and assist the visually impaired. However, it is essential to predict prosody phrase boundaries for accurate natural speech synthesis. This study proposes a method to build a reliable learning corpus to train prosody boundary prediction models based on deep learning. In addition, we offer a way to generate a rule-based model that can predict the prosody boundary from the constructed corpus and use the result to train a deep learning-based model. As a result, we have built a coherent corpus, even though many workers have participated in its development. The estimated pairwise agreement of corpus annotations is between 0.7477 and 0.7916 and kappa coefficient (K) between 0.7057 and 0.7569. In addition, the deep learning-based model based on the rules obtained from the corpus showed a prediction accuracy of 78.57% for the three-level prosody phrase boundary, 87.33% for the two-level prosody phrase boundary.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.