Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.422
|View full text |Cite
|
Sign up to set email alerts
|

Would you describe a leopard as yellow? Evaluating crowd-annotations with justified and informative disagreement

Abstract: Semantic annotation tasks contain ambiguity and vagueness and require varying degrees of world knowledge. Disagreement is an important indication of these phenomena. Most traditional evaluation methods, however, critically hinge upon the notion of inter-annotator agreement. While alternative frameworks have been proposed, they do not move beyond agreement as the most important indicator of quality. Critically, evaluations usually do not distinguish between instances in which agreement is expected and instances… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…Given identical instructions and identical items, some annotators may focus on different attributes of the item or have a different interpretation of the labeling criteria. Understanding and modelling label uncertainty remains a compelling challenge in evaluating machine learning systems (Sommerauer, Fokkens, and Vossen, 2020;Resnick et al, 2021).…”
Section: Developments Inmentioning
confidence: 99%
“…Given identical instructions and identical items, some annotators may focus on different attributes of the item or have a different interpretation of the labeling criteria. Understanding and modelling label uncertainty remains a compelling challenge in evaluating machine learning systems (Sommerauer, Fokkens, and Vossen, 2020;Resnick et al, 2021).…”
Section: Developments Inmentioning
confidence: 99%
“…Recent studies which investigated disagreements in natural language inference (Pavlick and Kwiatkowski, 2019) and semantic annotation (Sommerauer et al, 2020) claim that disagreement in natural language evaluation is often expected due to ambiguity and variation of language. Therefore, a number of disagreements do not represent "errors" or "noise" but are fully legitimate.…”
Section: Related Workmentioning
confidence: 99%
“…Given identical instructions and identical items, some annotators may focus on different attributes of the item or have a different interpretation of the labeling criteria. Understanding and modelling label uncertainty remains a compelling challenge in evaluating machine learning systems [45].…”
Section: Introductionmentioning
confidence: 99%