To understand the user experience in social media or to facilitate the design of human-centric services by social media, users' opinions about specific entities in text messages should be captured. A fine-grained named entity recognizer (NER) is an essential module for identifying opinion targets in text messages, and a named-entity (NE) dictionary is a major resource that affects the performance of an NER. However, it is not easy to construct an NE dictionary manually, because human annotation is time-consuming and labor-intensive. To reduce construction time and labor, we propose a semi-automatic system to construct an NE dictionary from the free online resource, Wikipedia. The proposed system constructs a pseudodocument for each Wikipedia NE by using an active-learning technique. It then classifies Wikipedia entries into NE classes based on similarities between the entries and pseudodocuments located in a vector space. In experiments, the proposed system classified 92.3 % of Wikipedia entries into 29 NE classes. It showed a high performance, with a macroaveraging F1-measure of 0.872 and micro-averaging F1-measure of 0.935.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.