2012
DOI: 10.1007/978-3-642-28997-2_16
|View full text |Cite
|
Sign up to set email alerts
|

On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents

Abstract: Abstract.We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
60
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 53 publications
(60 citation statements)
references
References 16 publications
0
60
0
Order By: Relevance
“…However, other sources of uncertainty could be considered. Recent research is particularly concerned with measuring uncertainty of the systems performance due to (i ) partial relevance judgments [2,4,20] and (ii ) errors in the relevance judgments made by human assessors [6,12]. Our future work will expand the theoretical model to incorporate additional sources of uncertainty and explore more general cost models for constructing test collections.…”
Section: Discussionmentioning
confidence: 99%
“…However, other sources of uncertainty could be considered. Recent research is particularly concerned with measuring uncertainty of the systems performance due to (i ) partial relevance judgments [2,4,20] and (ii ) errors in the relevance judgments made by human assessors [6,12]. Our future work will expand the theoretical model to incorporate additional sources of uncertainty and explore more general cost models for constructing test collections.…”
Section: Discussionmentioning
confidence: 99%
“…So, we set the range of d as [0. 5,3] with a step size of 0.5. For the criterion c, the reported c suggests that both NIST and crowdsourced assessors are conservative with NIST assessors being more conservative than the crowdsourced workers [11,12].…”
Section: Experiments Settingsmentioning
confidence: 99%
“…It is well known that secondary assessors produce relevance judgments that differ from those that are or would be produced by primary assessors [4]. Whether there is a single secondary assessor, or a group of secondary assessors that are combined using sophisticated algorithms [5,6], there will be differences.…”
Section: Introductionmentioning
confidence: 99%
“…Hauff et al compared all the above methods across 16 different test collections (from TREC and elsewhere), finding Soboroff et al's random-voting method best on nine collections, and Nuray and Can's Condorcet method best on six [8]. Hosseini et al used the EM framework to solve the problem of acquiring relevance judgements for Book Search tasks through crowdsourcing when no true relevance labels are available [9].…”
Section: Related Workmentioning
confidence: 99%