Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015
DOI: 10.1145/2766462.2767728
|View full text |Cite
|
Sign up to set email alerts
|

User Variability and IR System Evaluation

Abstract: Test collection design eliminates sources of user variability to make statistical comparisons among information retrieval (IR) systems more affordable. Does this choice unnecessarily limit generalizability of the outcomes to real usage scenarios? We explore two aspects of user variability with regard to evaluating the relative performance of IR systems, assessing effectiveness in the context of a subset of topics from three TREC collections, with the embodied information needs categorized against three levels … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
48
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 62 publications
(49 citation statements)
references
References 33 publications
1
48
0
Order By: Relevance
“…Carterette and colleagues used a mixed-effect model to account for variance due to both the topic sample and an effect they called the user effect, which represents differences in how patient different users are with respect to finding relevant documents [10]. Bailey and colleagues characterize a similar sort of user effect to make recommendations regarding test collection design [2] but do not incorporate that variance within significance testing. In work that is most similar to this article, Robertson and Kanoulas produced multiple simulated measurements per run-topic combination and used the replicate measurements in a mixed-effects model to test for statistically significant differences between system pairs [22].…”
Section: Introductionmentioning
confidence: 99%
“…Carterette and colleagues used a mixed-effect model to account for variance due to both the topic sample and an effect they called the user effect, which represents differences in how patient different users are with respect to finding relevant documents [10]. Bailey and colleagues characterize a similar sort of user effect to make recommendations regarding test collection design [2] but do not incorporate that variance within significance testing. In work that is most similar to this article, Robertson and Kanoulas produced multiple simulated measurements per run-topic combination and used the replicate measurements in a mixed-effects model to test for statistically significant differences between system pairs [22].…”
Section: Introductionmentioning
confidence: 99%
“…By considering the SERP as a whole, this provides a way to model abandonment within the search process, rather than assuming that a searcher will assess the first snippet specifically. This therefore marks a departure from assumptions encoded within many Information Retrieval (IR) models and measures, such as P @k, RBP [24], and INST [1,23,31]. The motivation for including this additional decision point stems from empirical research (i.e.…”
Section: Updating the Complex Searcher Modelmentioning
confidence: 99%
“…Finally, there is higher task complexity in clinical search (Koopman & Zuccon, 2014b) and research has shown that query variation is more significant with higher task complexity (Bailey, Moffat, Scholer, & Thomas, 2015). First, it is an important real-world information-seeking task in the medical domain that underpins the success of clinical trials, which are critical for the advancement of science and medicine.…”
Section: The Information Need: Searching For Clinical Trialsmentioning
confidence: 99%
“…There is evidence from initial studies showing that variability in queries had as much impact on retrieval effectiveness as variability in systems (Bailey et al, 2015;Moffat, Scholer, Thomas, & Bailey, 2015b). Azzopardi (2009) noted that the effectiveness of an IR system was strongly influenced by the query submitted; they further went on to quantify the likely effort involved in submitting effective queries.…”
Section: Query Variability Is As Big As System Variabilitymentioning
confidence: 99%