Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.
IntroductionContext-based search is the process of seeking material based on a topic of interest (Budzik, Hammond, & Birnbaum, 2001;Kraft, Chang, Maghoul, & Kumar, 2006;Maguitman, Leake, & Reichherzer, 2005). Consider, for example, a journalist writing an article about the H1N1 flu pandemic. The journalist has collected a small set of articles related to the topic at hand and would like to retrieve additional material from other sources. This local collection of documents is indexed and tagged as relevant while other documents are added to the index and tagged as irrelevant. The journalist can be assisted by an intelligent system that monitors the journalist's task, generates an initial set of queries, and incrementally refines these queries to better reflect the topic of interest. The initial queries could be generated directly from the journalist's context (e.g., the document that is being edited) or from a short description provided explicitly by the journalist. These initial queries will be incrementally refined based on the small collection of readily available material, which contains documents related to the H1N1 flu pandemic. In subsequent steps, the refined topical queries are used by the system to retrieve relevant material from a larger corpus containing novel material, such as the Web.The availability of powerful search interfaces makes it possible to develop a plethora of applications for information access in context such as the one described earlier. The matching and ranking mechanisms employed by existing search services are commonly fixed by the service provider. The criteria considered for retrieval change from service to service (e.g., content relevance, document popularity, document freshness, meta-information) and are typically obscure to those using the interface. The only access point to relevant material is through the submission of queries. As a consequence, learning to automatically formulate effective topical queries is an important research problem in the area of context-based search.
JOURNAL ...