The importance of named entities in information retrieval and knowledge management has recently brought interest in characterizing semantic relationships between entities. In this paper, we propose a method for measuring semantic similarity, an important type of semantic relationship, between entities. The method is based on Google Directory, a search interface to the Open Directory Project. Via the search engine, we can locate the web pages relevant to an entity and automatically create a profile of the entity according to the directory assignments of its web pages, which capture various features of the entity. Using their profiles, the semantic similarity between entities can be measured in different dimensions. We apply the semantic similarity measurement to two knowledge acquisition tasks: thesaurus construction of entities and fine grained categorization of entities. Our experiments demonstrate that the proposed method works effectively in these two tasks.
Question answering (QA) systems aim at finding answers to question posed in natural language using a collection of documents. When the collection is extracted from the Web, the structure and style of the texts are quite different from those of newspaper articles. We developed a QA system based on an answer validation process able to handle Web specificity. A large number of candidate answers are extracted from short passages in order to be validated according to question and passages characteristics. The validation module is based on a machine learning approach. It takes into account criteria characterizing both passage and answer relevance at surface, lexical, syntactic and semantic levels to deal with different types of texts. We present and compare results obtained for factual questions posed on a Web and on a newspaper collection. We show that our system outperforms a baseline by up to 48% in MRR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.