2008
DOI: 10.1145/1480506.1480527
|View full text |Cite
|
Sign up to set email alerts
|

Low-cost and robust evaluation of information retrieval systems

Abstract: Research in Information Retrieval has progressed against a background of rapidly increasing corpus size and heterogeneity, with every advance in technology quickly followed by a desire to organize and search more unstructured, more heterogeneous, and even bigger corpora. But as retrieval problems get larger and more complicated, evaluating the ranking performance of a retrieval engine gets harder: evaluation requires human judgments of the relevance of documents to queries, and for very large corpora the cost … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 68 publications
0
16
0
Order By: Relevance
“…Carterette showed that the mean and variance for precision at k and average precision have analytical forms [6]. Given a query Q ∈ Q, these analytical forms are:…”
Section: Interval Estimates Of Reusabilitymentioning
confidence: 99%
See 1 more Smart Citation
“…Carterette showed that the mean and variance for precision at k and average precision have analytical forms [6]. Given a query Q ∈ Q, these analytical forms are:…”
Section: Interval Estimates Of Reusabilitymentioning
confidence: 99%
“…Although we primarily focus on precision at k and average precision in this paper, it should be noted that analytical forms for the means and variances of other retrieval metrics exist, including recall and NDCG [6]. Thus, our intervalbased reusability measures can be easily applied to these metrics, as well.…”
Section: Confidence Intervalsmentioning
confidence: 99%
“…This has led researchers to pursue low cost strategies for constructing manual test collections. Two emerging evaluation paradigms are minimal test collections [8,7,6] and crowdsourcing [2]. Both of these strategies are useful for low-cost one-time evaluations.…”
Section: Pseudo Test Collectionsmentioning
confidence: 99%
“…The queries are either sampled from query logs or manually generated. Each query is then issued to one or more retrieval systems, which returns candidate documents that are then judged, either via pooling [21,37], the minimal test collection paradigm [8,7,6], or crowdsourcing [2].…”
Section: Pseudo Test Collectionsmentioning
confidence: 99%
“…However, in more constrained research environments these options are not available, and relevance judgments are usually provided by humans. To reduce the cost of this potentially expensive process, researchers have developed low-cost evaluation strategies, including minimal test collections [2] and crowdsourcing [1]. Despite the usefulness of these strategies, the resulting relevance judgments cannot easily be "ported" to a new or different corpus.…”
Section: Introductionmentioning
confidence: 99%