Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.13
|View full text |Cite
|
Sign up to set email alerts
|

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Abstract: Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but rarely actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downstream use. To address this issue, we propose two contrasting paradigms for data annotation. The descriptive paradig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
33
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 42 publications
(35 citation statements)
references
References 20 publications
2
33
0
Order By: Relevance
“…In addition, researchers commonly use some notion of the agreement to measure the task's subjectiveness, such as using inter-annotator agreement metrics Cohen's Kappa (Cohen 1960) or Fleiss' Kappa (Fleiss 1971) to measure annotations' reliability. But when presenting the final results in the downstream task, people usually use the aggregated labels that can conceal informative disagreement and evaluation metrics that are unaware of the task's subjectiveness (Röttger et al 2022).…”
Section: Related Workmentioning
confidence: 99%
“…In addition, researchers commonly use some notion of the agreement to measure the task's subjectiveness, such as using inter-annotator agreement metrics Cohen's Kappa (Cohen 1960) or Fleiss' Kappa (Fleiss 1971) to measure annotations' reliability. But when presenting the final results in the downstream task, people usually use the aggregated labels that can conceal informative disagreement and evaluation metrics that are unaware of the task's subjectiveness (Röttger et al 2022).…”
Section: Related Workmentioning
confidence: 99%
“…While in some cases it may help to resolve differences between annotators (Hagerer et al, 2021), it is often insightful to acknowledge and explore the subjectivity of labels assigned by people or groups (Leonardelli et al, 2021;Sap et al, 2022). A dataset with labels from individuals, termed descriptive annotations, will help us build models to better understand differences in people's views of socially acceptable behavior (Röttger et al, 2022). Lourie et al (2021) first examined AITA, suggesting that the descriptive ethics contained in people's judgements could serve as a valuable resource for developing machines that can appropriately and safely interact with people.…”
Section: Related Workmentioning
confidence: 99%
“…Recent research suggests embracing disagreements by developing multi-annotator architectures that capture differences in annotator perspective (Davani et al, 2022;Basile et al, 2021;Uma et al, 2021). While this approach better models how abuse is perceived, it is not suitable for content moderation where one has to decide whether to remove a post and a prescriptive paradigm is preferable (Rottger et al, 2022). Zufall et al (2020) adopt a more objective approach, as they aim to detect content that is illegal according to EU legislation.…”
Section: Related Workmentioning
confidence: 99%