Proceedings of the 7th ACM International Conference on Web Search and Data Mining 2014
DOI: 10.1145/2556195.2556268
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting user disagreement for web search evaluation

Abstract: To express a more nuanced notion of relevance as compared to binary judgments, graded relevance levels can be used for the evaluation of search results. Especially in Web search, users strongly prefer top results over less relevant results, and yet they often disagree on which are the top results for a given information need. Whereas previous works have generally considered disagreement as a negative effect, this paper proposes a method to exploit this user disagreement by integrating it into the evaluation pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2015
2015
2015
2015

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(16 citation statements)
references
References 22 publications
0
16
0
Order By: Relevance
“…To estimate the impact of this assumption, previous work studies the disagreement of assessors on binary labels and its influence on search engine comparisons (Voorhees 2001), leading to the conclusion that search engine comparisons are stable even under substantial assessor disagreement. Demeester et al (2014) show that in a graded relevance setting, this disagreement is especially strong on the top relevance levels. The current paper explicitly models the disagreement between assessors and particular scenarios of user relevance, such as users that are only satisfied with top results, or users that are looking for any result that is at least marginally relevant.…”
Section: Introductionmentioning
confidence: 86%
See 1 more Smart Citation
“…To estimate the impact of this assumption, previous work studies the disagreement of assessors on binary labels and its influence on search engine comparisons (Voorhees 2001), leading to the conclusion that search engine comparisons are stable even under substantial assessor disagreement. Demeester et al (2014) show that in a graded relevance setting, this disagreement is especially strong on the top relevance levels. The current paper explicitly models the disagreement between assessors and particular scenarios of user relevance, such as users that are only satisfied with top results, or users that are looking for any result that is at least marginally relevant.…”
Section: Introductionmentioning
confidence: 86%
“…Note that Demeester et al (2014) present evidence and a quantitative analysis of user disagreement, most of which will not be repeated here. For example, it was shown that for the FedWeb12 dataset (Nguyen et al 2012), the interassessor disagreement was much stronger than the intra-assessor disagreement.…”
Section: Introductionmentioning
confidence: 98%
“…where ∂q/∂Si,y and ∂F /∂Si,y can be obtained from Equation (4), (5) and (6). In order to take care of all the constraints of S, since the feasible set is the intersection of multiple linear inequalities, we can project the updated (possibly not feasible) Z into the feasible set after each update, by finding the closest point S in the feasible set Φν , which is a quadratic optimization problem minimizing the Frobenius norm ||S − Z||F with constraints.…”
Section: Now We Can Optimizementioning
confidence: 99%
“…Das et al [5] addressed the interactions of opinions between people connected by networks. Demeester et al [6] discussed the disagreement between different users on assessment of web search results. Their studies also focus on understanding annotator behavior, but none of them consider the case when multiple data items are organized into a batch.…”
Section: Related Workmentioning
confidence: 99%
“…Demeester et al . [8] discuss the disagreement between different users on assessment of web search results.…”
Section: Related Workmentioning
confidence: 99%