Retrieval evaluation with incomplete relevance data

Ahlgren, Per; Grönqvist, Leif

doi:10.1145/1183614.1183773

Cited by 8 publications

(11 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Figure 1(a) shows that reducing the size of the qrels decreases the value of all measures, except for bpref and RankEff. So far, this is consistent with earlier findings [4] [1]. What might be surprising, however, is that bpref, contrary to earlier finding, does not exhibit a dramatic increase when the qrels are reduced.…”

Section: Incomplete Unbiased Judgementssupporting

confidence: 91%

“…Evaluation accuracy with incomplete judgements under a given measure is usually evaluated by selecting a random subset of the judged documents and comparing the ranking produced according to the reduced set of judgements with the ranking produced according to the original judgements (cf. [1], [4], [15]). Such incomplete judgements do not favor any particular system.…”

Section: Related Workmentioning

confidence: 99%

“…Research in the area of incomplete judgements usually focuses on the case of unbiased judgements, where judgements are incomplete, but do not favor any particular system over another (see [1], [4], [15] for examples). In this paper, we address the problem of biased judgements and discuss how the bias can be removed from the judgements.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Reliable information retrieval evaluation with incomplete and biased judgements

Büttcher

Clarke

Yeung

et al. 2007

Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to the pool of judged documents. This may distort the results obtained about the relative quality of the systems evaluated and thus lead to incorrect conclusions about the performance of a particular ranking technique.We examine the magnitude of this effect and explore how it can be countered by automatically building an unbiased set of judgements from the original, biased judgements obtained through pooling. We compare the performance of this method with other approaches to the problem of incomplete judgements, such as bpref, and show that the proposed method leads to higher evaluation accuracy, especially if the set of manual judgements is rich in documents, but highly biased against some systems.

show abstract

Section: Incomplete Unbiased Judgementssupporting

confidence: 91%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Reliable information retrieval evaluation with incomplete and biased judgements

Büttcher

Clarke

Yeung

et al. 2007

Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…The evaluation metric bpref [6] is also bounded from below and the metric RankEff [2] is directly maximized when minimizing the number of mis-ranked document pairs in the ranked list. bpref was designed as a stable performance metric when relevance judgments are incomplete, and RankEff builds upon the bpref measure, taking into account all retrieved non-relevant documents.…”

Section: Bpref Rankeff and Misranked Document Pairsmentioning

confidence: 99%

“…bpref was designed as a stable performance metric when relevance judgments are incomplete, and RankEff builds upon the bpref measure, taking into account all retrieved non-relevant documents. Both measures are known to correlate well with average precision in TREC data [2,9,6] and bpref is currently reported in annual TREC evaluation results [8]. These metrics are defined as:…”

Section: Bpref Rankeff and Misranked Document Pairsmentioning

confidence: 99%

Fast learning of document ranking functions with the committee perceptron

Elsas

Carvalho

Carbonell

2008

Proceedings of the International Conference on Web Search and Web Data Mining - WSDM '08

View full text Add to dashboard Cite

This paper presents a new variant of the perceptron algorithm using selective committee averaging (or voting). We apply this agorithm to the problem of learning ranking functions for document retrieval, known as the "Learning to Rank" problem. Most previous algorithms proposed to address this problem focus on minimizing the number of misranked document pairs in the training set. The committee perceptron algorithm improves upon existing solutions by biasing the final solution towards maximizing an arbitrary rank-based performance metrics. This method performs comparably or better than two state-of-the-art rank learning algorithms, and also provides significant training time improvements over those methods, showing over a 45-fold reduction in training time compared to ranking SVM.

show abstract

Uncertainty Representations for Information Retrieval with Missing Data

Jousselme

Maupin

2016

Fusion Methodologies in Crisis Management

View full text Add to dashboard Cite

Retrieval evaluation with incomplete relevance data

Cited by 8 publications

References 2 publications

Reliable information retrieval evaluation with incomplete and biased judgements

Reliable information retrieval evaluation with incomplete and biased judgements

Fast learning of document ranking functions with the committee perceptron

Uncertainty Representations for Information Retrieval with Missing Data

Contact Info

Product

Resources

About