2020
DOI: 10.1613/jair.1.12012
|View full text |Cite
|
Sign up to set email alerts
|

Annotator Rationales for Labeling Tasks in Crowdsourcing

Abstract: When collecting item ratings from human judges, it can be difficult to measure and enforce data quality due to task subjectivity and lack of transparency into how judges make each rating decision. To address this, we investigate asking judges to provide a specific form of rationale supporting each rating decision. We evaluate this approach on an information retrieval task in which human judges rate the relevance of Web pages for different search topics. Cost-benefit analysis over 10,000 judgments collected on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 26 publications
(25 citation statements)
references
References 69 publications
0
25
0
Order By: Relevance
“…This is not a problem if the task is a definite-annotation task, where labels and data are determined in a one-to-one 04-3 correspondence. However, for tasks where the annotation criterion is not explicitly and uniquely defined, it is difficult to gain uniformity among a large number of labels [13,14,15]. In particular, when the target data are highly specialized, such as medical data, the annotation criteria for tasks, such as diagnosing the presence or absence of a lesion, often depend on the knowledge and experience of the annotator [16].…”
Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning
confidence: 99%
See 1 more Smart Citation
“…This is not a problem if the task is a definite-annotation task, where labels and data are determined in a one-to-one 04-3 correspondence. However, for tasks where the annotation criterion is not explicitly and uniquely defined, it is difficult to gain uniformity among a large number of labels [13,14,15]. In particular, when the target data are highly specialized, such as medical data, the annotation criteria for tasks, such as diagnosing the presence or absence of a lesion, often depend on the knowledge and experience of the annotator [16].…”
Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning
confidence: 99%
“…Given the increasing demand for machine learning in recent years, it is undesirable to pay high costs for annotation work to create large datasets. Moreover, the distributed labor force on the Internet is currently utilized for annotation in many areas [13,14,15,18], as a massive labor force is required for making large, accurately annotated datasets.…”
Section: Need For Boosting In Making Datasets With Accurate Labels Fo...mentioning
confidence: 99%
“…Since the advent of crowdsourcing platforms such as MTurk (Buhrmester et al, 2011), the quality assessment of subjective annotations is a much-researched topic (e.g., Nguyen et al, 2016;Kutlu et al, 2020). Two persisting problems, however, are (1) the opacity of the analytical criteria employed by crowd annotators (especially when using disparate chord vocabularies) and (2) the question of how to assess the quality of annotation sets in which many labels do not coincide (for example in the case of diverging analytical granularities, see Subsection 3.2.1).…”
Section: An Alternative Procedures For Verifying Expert Annotationsmentioning
confidence: 99%
“…The concept tag serves multiple purposes. First, it acts as a rationale (Kutlu et al, 2020;McDonnell et al, 2016), requiring workers to justify their answers and thus nudging them towards high-quality selections. Rationales also provide a form of transparency to help requesters better understand worker intent.…”
Section: Stage 1: Finding Ambiguous Examplesmentioning
confidence: 99%