This paper introduces category and link expansion strategies for the XML Entity Ranking track at INEX 2007. Category expansion is a coefficient propagation method for the Wikipedia category hierarchy based on given categories or categories derived from sample entities. Link expansion utilizes links between Wikipedia articles. The strategies are evaluated within the entity ranking and list completion tasks.
In contemporary query languages, the user is responsible for navigation among semantically related data. Because of the huge amount of data and the complex structural relationships among data in modern applications, it is unrealistic to suppose that the user could know completely the content and structure of the available information. There are several query languages whose purpose is to facilitate navigation in unknown structures of databases. However, the background assumption of these languages is that the user knows how data are related to each other semantically in the structure at hand. So far only little attention has been paid to how unknown semantic associations among available data can be discovered. We address this problem in this article. A semantic association between two entities can be constructed if a sequence of relationships expressed explicitly in a database can be found that connects these entities to each other. This sequence may contain several other entities through which the original entities are connected to each other indirectly. We introduce an expressive and declarative query language for discovering semantic associations. Our query language is able, for example, to discover semantic associations between entities for which only some of the characteristics are known. Further, it integrates the manipulation of semantic associations with the manipulation of documents that may contain information on entities in semantic associations.
In our query language introduced in Part I (Niemi & Jämsen, in press) the user can formulate queries to find out (possibly complex) semantic relationships among entities. In this article we demonstrate the usage of our query language and discuss the new applications that it supports. We categorize several query types and give sample queries. The query types are categorized based on whether the entities specified in a query are known or unknown to the user in advance, and whether text information in documents is utilized. Natural language is used to represent the results of queries in order to facilitate correct interpretation by the user. We discuss briefly the issues related to the prototype implementation of the query language and show that an independent operation like Rho (Sheth et al., 2005; Anyanwu & Sheth, 2002, 2003), which presupposes entities of interest to be known in advance, is exceedingly inefficient in emulating the behavior of our query language. The discussion also covers potential problems, and challenges for future work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.