2021
DOI: 10.1613/jair.1.12752
|View full text |Cite
|
Sign up to set email alerts
|

Learning from Disagreement: A Survey

Abstract: Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumpti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
34
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 51 publications
(63 citation statements)
references
References 56 publications
1
34
0
Order By: Relevance
“…3.1. In the case of ordered labels (e.g., our speech labels), mutual information, proposed by [24] as a good evaluation measure when learning with disagreement, is not appropriate as it neglects the labels' ordering. Proper performance measures in our case include ordinal Krippendorff's Alpha, which accommodates both the ordered nature of the labels (from normal to the most hateful, violent speech, and consequently a varying magnitude of disagreements), and class imbalance (where the Violent class is underrepresented).…”
Section: Methodological Implicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…3.1. In the case of ordered labels (e.g., our speech labels), mutual information, proposed by [24] as a good evaluation measure when learning with disagreement, is not appropriate as it neglects the labels' ordering. Proper performance measures in our case include ordinal Krippendorff's Alpha, which accommodates both the ordered nature of the labels (from normal to the most hateful, violent speech, and consequently a varying magnitude of disagreements), and class imbalance (where the Violent class is underrepresented).…”
Section: Methodological Implicationsmentioning
confidence: 99%
“…A recent work proposed a data perspectivist approach to ground truthing and suggested a spectrum of possibilities ranging from the traditional gold standard to the so-called "diamond standard", in which multiple labels are kept throughout the whole ML pipeline [3]. It has also been observed that training directly from soft labels (i.e., distributions over classes) can achieve higher performance than training from aggregated labels under certain conditions (e.g., large datasets and high quality annotators) [24]. Studies in hate speech classification came to similar conclusions and showed that supervised models informed by different perspectives on the target phenomena outperform a baseline represented by models trained on fully aggregated data [1].…”
Section: Introductionmentioning
confidence: 99%
“…Our investigation opens avenues for additional experiments with advanced methods to improve transfer learning (Howard and Ruder, 2018;Jiang et al, 2020;Nguyen et al, 2021) and mitigate catastrophic forgetting (Kirkpatrick et al, 2017;Li and Hoiem, 2018;Thompson et al, 2019). Further, based on the analysis of classification errors, we suggest incorporating the annotators (dis-) agreement into the training of the model, e.g., by employing the full distributions of annotations, as opposed to the current majority approach (Uma et al, 2021).…”
Section: Conclusion and Directionsmentioning
confidence: 95%
“…with recent work emphasizing the importance of modeling annotators' disagreement in subjective tasks (Davani et al, 2022;Leonardelli et al, 2021;Uma et al, 2021) and initiatives supporting the release of disaggregated annotations in NLP (Abercrombie et al, 2022).…”
Section: Modelsmentioning
confidence: 99%