Using Replicates in Information Retrieval Evaluation

Voorhees, Ellen M.; Samarov, Daniel V.; Soboroff, Ian

doi:10.1145/3086701

Cited by 49 publications

(30 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ANOVA is a statistical method which is used to check if the means of two or more groups are significantly different from each other. It was widely used in the 1990s to explore the TREC IR runs results [21,25] and has recently been revived [12,24]. For a thorough understanding of the ANOVA, we would refer the readers to Miller's book [15] or Ferro et al [12].…”

Section: Data Analysis Objectives and Methodsmentioning

confidence: 99%

Studying the Variability of System Setting Effectiveness by Data Analytics and Visualization

Déjean

Mothe²,

Ullah

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Search engines differ from their modules and parameters; defining the optimal system setting is challenging the more because of the complexity of a retrieval stream. The main goal of this study is to determine which are the most important system components and parameters in system setting, thus which ones should be tuned as the first priority. We carry out an extensive analysis of 20, 000 different system settings applied to three TREC ad-hoc collections. Our analysis includes zooming in and out the data using various data analysis methods such as ANOVA, CART, and data visualization. We found that the query expansion model is the most significant component that changes the system effectiveness, consistently across collections. Zooming in the queries, we show that the most significant component changes to the retrieval model when considering easy queries only. The results of our study are directly re-usable for the system designers and for system tuning.

show abstract

Section: Data Analysis Objectives and Methodsmentioning

confidence: 99%

Studying the Variability of System Setting Effectiveness by Data Analytics and Visualization

Déjean

Mothe²,

Ullah

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…More recently, Voorhees et al [60] conducted experiments where the researchers randomly split TREC collections into shards, thus creating more replicates for each (topic, system) pair and allowing them to examine topic*system interactions. By modeling the interactions, Voorhees et al were able to measure more significant differences between retrieval systems.…”

Section: Anovamentioning

confidence: 99%

“…Such statistical processes model scores as a combination of factors and factor interactions. The models were extended to include a topic*system interaction [6,45,60].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Using Collection Shards to Study Retrieval Performance Effect Sizes

Ferro

Kim

Sanderson

2019

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

Despite the bulk of research studying how to more accurately compare the performance of IR systems, less attention is devoted to better understanding the different factors that play a role in such performance and how they interact. This is the case of shards, i.e., partitioning a document collection into sub-parts, which are used for many different purposes, ranging from efficiency to selective search or making test collection evaluation more accurate. In all these cases, there is empirical knowledge supporting the importance of shards, but we lack actual models that allow us to measure the impact of shards on system performance and how they interact with topics and systems. We use the general linear mixed model framework and present a model that encompasses the experimental factors of system, topic, shard, and their interaction effects. This detailed model allows us to more accurately estimate differences between the effect of various factors. We study shards created by a range of methods used in prior work and better explain observations noted in prior work in a principled setting and offer new insights. Notably, we discover that the topic*shard interaction effect, in particular, is a large effect almost globally across all datasets, an observation that, to our knowledge, has not been measured before.

show abstract

“…Therefore, [16] used simulation based on distributions of relevant and not relevant documents to demonstrate the importance of the Topic*System interaction effect. Very recently, [19] exploited random partitions of the document corpus to obtain more replicates of each (topic, system) pair, obtaining an estimation of the Topic*System interaction effect which allowed for improved precision in determining the System effect.…”

Section: Performance Factor Analysis In Irmentioning

confidence: 99%

The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction

Ferro¹,

Fuhr²,

Grefenstette³

et al. 2018

SIGIR Forum

Self Cite

View full text Add to dashboard Cite

This paper reports the findings of the Dagstuhl Perspectives Workshop 17442 on performance modeling and prediction in the domains of Information Retrieval, Natural language Processing and Recommender Systems. We present a framework for further research, which identifies five major problem areas: understanding measures, performance analysis, making underlying assumptions explicit, identifying application features determining performance, and the development of prediction models describing the relationship between assumptions, features and resulting performance.

show abstract

Using Replicates in Information Retrieval Evaluation

Cited by 49 publications

References 34 publications

Studying the Variability of System Setting Effectiveness by Data Analytics and Visualization

Studying the Variability of System Setting Effectiveness by Data Analytics and Visualization

Using Collection Shards to Study Retrieval Performance Effect Sizes

The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction

Contact Info

Product

Resources

About