Ontology has become a hot research topic in the fields of artificial intelligence such as knowledge representation, knowledge engineering and natural language processing (NLP) etc.. In this paper, according to the application requirements in the intelligent Uyghur information retrieval system, by giving the brief description about the ontology and its construction rules, methods, tools and descriptive languages, have conducted the contrastive analysis the current research status about the ontology in domestic and abroad, and then sum up some key issues in Uyghur ontology construction procedures and some early achievements. After all, the further research directions are also proposed in this paper.
In the internet age, as a conceptual model of knowledge organisation, ontology has become a research hotspot. Ontology extension achieves the purpose of expanding ontology by adding new concepts and discovering the relationships between concepts in the existing ontology. In order to improve the automation and accuracy of ontology concept extraction in the Uyghur language, here the authors propose a new method to automatically extract concepts from text collection. For the characteristics of the Uyghur domain ontology concept, the text preprocessing is performed first, then the inter-word correlation of multi-feature fusion is calculated, such as Mi, Cd and Ea, and finally, the domain terminology and concept are automatically extracted, according to the term frequency-inverse document frequency algorithm. Experiment results show that, in terms of precision and recall rate, the multi-feature method proposed here represents an improvement over other methods. It also proves the feasibility and effectiveness of the authors' method to extract the domain concept.
Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.