Abstract-In recent years, research on Wikification, which aims to promote the effective reuse the Wikipedia resources and the understanding of document contents, is attracting much attention. Wikification is a method to automatically extract keywords from a document, and to link them to an appropriate Wikipedia article. Wikification consists of two processes. First, we extract keywords from a document. Second, we identify the appropriate Wikipedia article for each of them. In this paper, we focus on the extraction of keywords from a document for Wikification. Research on Wikification has been conducted for documents in variety of languages. We focus on East Asian language documents and experiment with Japanese documents. Besides, we are planning to do the Wikification not only for documents in the same language but also for other languages (e.g. keywords in Japanese documents are linked to appropriate English Wikipedia articles).Our proposed method consists of two steps. First, we extract nouns from a document using a morphological analysis tool, and extract candidate keywords by a method called Top Consecutive Nouns Cohesion (TCNC). The TCNC connects continuous nouns and treat them as one compound word. Second, we rank the extracted candidate keywords using one of two measures for keyword importance, Dice coefficient or Keyphraseness.In our experiments of extracting appropriate keywords for Wikification in Japanese documents, our proposed method, especially the combination of TCNC and Keyphraseness, achieved the best results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.