Heikki Keskustalo scite author profile

Evaluation is central in research and development of information retrieval (IR). In addition to designing and implementing new retrieval mechanisms, one must also show through rigorous evaluation that they are effective. A major focus in IR is IR mechanisms' capability of ranking relevant documents optimally for the users, given a query. Searching for information in practice involves searchers, however, and is highly interactive. When human searchers have been incorporated in evaluation studies, the results have often suggested that better ranking does not necessarily lead to better search task, or work task, performance. Therefore, it is not clear which system or interface features should be developed to improve the effectiveness of human task performance. In the present article, we focus on the evaluation of task-based information interaction (TBII). We give special emphasis to learning tasks to discuss TBII in more concrete terms. Information interaction is here understood as behavioral and cognitive activities related to task planning, searching information items, selecting between them, working with them, and synthesizing and reporting. These five generic activities contribute to task performance and outcome and can be supported by information systems. In an attempt toward task-based evaluation, we introduce program theory as the evaluation framework. Such evaluation can investigate whether a program consisting of TBII activities and tools works and how it works and, further, provides a causal description of program (in)effectiveness. Our goal in the present article is to structure TBII on the basis of the five generic activities and consider the evaluation of each activity using the program theory framework. Finally, we combine these activity-based program theories in an overall evaluation framework for TBII. Such an evaluation is complex due to the large number of factors affecting information interaction. Instead of presenting tested program theories, we illustrate how the evaluation of TBII should be accomplished using the program theory framework in the evaluation of systems and behaviors, and their interactions, comprehensively in context.

show abstract

Modeling behavioral factors ininteractive information retrieval

Baskaya

Keskustalo

Järvelin

2013

View full text Add to dashboard Cite

In real-life, information retrieval consists of sessions of one or more query iterations. Each iteration has several subtasks like query formulation, result scanning, document link clicking, document reading and judgment, and stopping. Each of the subtasks has behavioral factors associated with them. These factors include search goals and cost constraints, query formulation strategies, scanning and stopping strategies, and relevance assessment behavior. Traditional IR evaluation focuses on retrieval and result presentation methods, and interaction within a single-query session. In the present study we aim at assessing the effects of the behavioral factors on retrieval effectiveness. Our research questions include how effective is human behavior employing search strategies compared to various baselines under various search goals and time constraints. We examine both ideal as well as fallible human behavior and wish to identify robust behaviors, if any. Methodologically, we use extensive simulation of human behavior in a test collection. Our findings include that (a) human behavior using multi-query sessions may exceed in effectiveness comparable single-query sessions, (b) the same empirically observed behavioral patterns are reasonably effective under various search goals and constraints, but (c) remain on average clearly below the best possible ones. Moreover, there is no behavioral pattern for sessions that would be even close to winning in most cases; the information need (or topic) in relation to the test collection is a determining factor.

show abstract

Test Collection-Based IR Evaluation Needs Extension toward Sessions – A Case of Extremely Short Queries

Keskustalo

Järvelin

Pirkola

et al. 2009

View full text Add to dashboard Cite

Abstract. There is overwhelming evidence suggesting that the real users of IR systems often prefer using extremely short queries (one or two individual words) but they try out several queries if needed. Such behavior is fundamentally different from the process modeled in the traditional test collection-based IR evaluation based on using more verbose queries and only one query per topic. In the present paper, we propose an extension to the test collection-based evaluation. We will utilize sequences of short queries based on empirically grounded but idealized session strategies. We employ TREC data and have test persons to suggest search words, while simulating sessions based on the idealized strategies for repeatability and control. The experimental results show that, surprisingly, web-like very short queries (including one-word query sequences) typically lead to good enough results even in a TREC type test collection. This finding motivates the observed real user behavior: as few very simple attempts normally lead to good enough results, there is no need to pay more effort. We conclude by discussing the consequences of our finding for IR evaluation.

show abstract

Creating and exploiting a comparable corpus in cross-language information retrieval

Talvensaari

Laurikkala

Järvelin

et al. 2007

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

show abstract

An Initial Investigation into Fixed and Adaptive Stopping Strategies

Maxwell

Azzopardi

Järvelin

et al. 2015

View full text Add to dashboard Cite

Most models, measures and simulations often assume that a searcher will stop at a predetermined place in a ranked list of results. However, during the course of a search session, real-world searchers will vary and adapt their interactions with a ranked list. These interactions depend upon a variety of factors, including the content and quality of the results returned, and the searcher's information need. In this paper, we perform a preliminary simulated analysis into the influence of stopping strategies when query quality varies. Placed in the context of ad-hoc topic retrieval during a multi-query search session, we examine the influence of fixed and adaptive stopping strategies on overall performance. Surprisingly, we find that a fixed strategy can perform as well as the examined adaptive strategies, but the fixed depth needs to be adjusted depending on the querying strategy used. Further work is required to explore how well the stopping strategies reflect actual search behaviour, and to determine whether one stopping strategy is dominant.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.