To understand the user experience in social media or to facilitate the design of human-centric services by social media, users' opinions about specific entities in text messages should be captured. A fine-grained named entity recognizer (NER) is an essential module for identifying opinion targets in text messages, and a named-entity (NE) dictionary is a major resource that affects the performance of an NER. However, it is not easy to construct an NE dictionary manually, because human annotation is time-consuming and labor-intensive. To reduce construction time and labor, we propose a semi-automatic system to construct an NE dictionary from the free online resource, Wikipedia. The proposed system constructs a pseudodocument for each Wikipedia NE by using an active-learning technique. It then classifies Wikipedia entries into NE classes based on similarities between the entries and pseudodocuments located in a vector space. In experiments, the proposed system classified 92.3 % of Wikipedia entries into 29 NE classes. It showed a high performance, with a macroaveraging F1-measure of 0.872 and micro-averaging F1-measure of 0.935.
In smart homes, information appliances interact with residents via social network services. To capture residents' intentions, the information appliances should analyze short text messages entered typically through small mobile devices. However, most information appliances have hardware constraints such as small memory, limited battery capacity, and restricted processing power. Therefore, it is not easy to embed intelligent applications based on natural language processing (NLP) techniques, which traditionally require large memory and high-end processing power, into information appliances. To overcome this problem, lightweight NLP modules should be implemented. We propose an automatic word spacing system, the first step module of NLP for many languages with their own word spacing rules, which is designed for information appliances with limited hardware resources. The proposed system consists of a word spacing dictionary and a pattern-matching module. When a sentence is entered, the pattern-matching module inserts spaces by simply looking up the word spacing dictionary in a back-off manner. In comparative experiments with previous models, the proposed method showed low memory usage (0.79 MB) and high character-unit accuracy (0.9460) without requiring complex arithmetical computations. On the basis of these experiments, we conclude that the proposed system is suitable for information appliances with many hardware limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.