2020
DOI: 10.21248/jlcl.34.2020.222
|View full text |Cite
|
Sign up to set email alerts
|

COLD: Annotation scheme and evaluation data set for complex offensive language in English

Abstract: This paper presents a new, extensible annotation scheme for offensive language data sets. The annotation scheme expands coverage beyond fairly straightforward cases of offensive language to address several cases of complex, implicit, and/or pragmaticallytriggered offensive language. We apply the annotation scheme to create a new Complex Offensive Language Data Set for English (COLD-EN). The primary purpose of this data set is to diagnose how well systems for automatic detection of abusive language are able to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…Nonetheless, non-i.i.d. generalisation is a persisting challenge (Yin and Zubiaga, 2021b), because models tend to overfit to specific topics (Nejadgholi and Kiritchenko, 2020;Bourgeade et al, 2023), social media users (Arango et al, 2019), or keywords, such as slurs or pejorative terms (Dixon et al, 2018;Kennedy et al, 2020;Talat et al, 2018;Palmer et al, 2020;Kurrek et al, 2020). When such overt terms are missing, models often fail to detect hate speech (ElSherief et al, 2021).…”
Section: Hate Speech Detectionmentioning
confidence: 99%
“…Nonetheless, non-i.i.d. generalisation is a persisting challenge (Yin and Zubiaga, 2021b), because models tend to overfit to specific topics (Nejadgholi and Kiritchenko, 2020;Bourgeade et al, 2023), social media users (Arango et al, 2019), or keywords, such as slurs or pejorative terms (Dixon et al, 2018;Kennedy et al, 2020;Talat et al, 2018;Palmer et al, 2020;Kurrek et al, 2020). When such overt terms are missing, models often fail to detect hate speech (ElSherief et al, 2021).…”
Section: Hate Speech Detectionmentioning
confidence: 99%
“…Nonetheless, non-i.i.d. generalisation is a persisting challenge (Yin and Zubiaga, 2021b), because models tend to overfit to specific topics (Nejadgholi and Kiritchenko, 2020;Bourgeade et al, 2023), social media users (Arango et al, 2019), or keywords, such as slurs or pejorative terms (Dixon et al, 2018;Kennedy et al, 2020;Talat et al, 2018;Palmer et al, 2020;Kurrek et al, 2020). When such overt terms are missing, models often fail to detect hate speech (ElSherief et al, 2021).…”
Section: Hate Speech Detectionmentioning
confidence: 99%
“…Other taxonomies include more fine-grained categories of abuse. For example, the annotation scheme proposed by Palmer et al (2020) revolves around the offensiveness of a message, the presence of slurs, adjectival nominalization, and distancing. Sanguinetti et al (2018) , instead, focus on hate messages against immigrants and annotate hate intensity, aggressiveness, offensiveness, irony and stereotypes.…”
Section: Related Workmentioning
confidence: 99%
“…Since abusive language online is a multi-faceted phenomenon and different categorizations of hate speech have been proposed over the years to account for different targets ( Kumar et al, 2018 ; Vidgen & Yasseri, 2020 ; Palmer et al, 2020 ; Zampieri et al, 2020 ), we aim on the one hand to take the complexity of online abuse into account, and on the other hand to be compatible with existing annotation schemes as much as possible. We therefore build upon the hierarchical taxonomy proposed in Zeinert, Inie & Derczynski (2021) , which has been designed to annotate misogynist messages online but nevertheless provides a backbone for the fine-grained annotation of other target types.…”
Section: A Taxonomy For Religious Hatementioning
confidence: 99%