Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Kirk, Hannah Rose; Vidgen, Bertram; Röttger, Paul; Thrush, Tristan; Hale, Scott A.

doi:10.48550/arxiv.2108.05921

Cited by 8 publications

(9 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The original HATE-CHECK (Röttger et al, 2021b) then introduced functional tests for hate speech detection models, using hand-crafted test cases to diagnose model weaknesses on different kinds of hate and non-hate. Kirk et al (2021) applied the same framework emoji-based hate. Manerba and Tonelli (2021) provide smaller-scale functional test for abuse detection systems.…”

Section: Related Workmentioning

confidence: 99%

“…For these reasons, recent hate speech research has introduced novel test sets and methods that allow for a more targeted evaluation of model functionalities (Calabrese et al, 2021;Kirk et al, 2021;Mathew et al, 2021;Röttger et al, 2021b). However, these novel test sets, like most hate speech datasets so far, focus on English-language content.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

HateCheck: Functional Tests for Hate Speech Detection Models

Röttger¹,

Vidgen²,

Nguyen³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

102

View full text Add to dashboard Cite

Detecting online hate is a difficult task that even state-of-the-art models struggle with. Typically, hate speech detection models are evaluated by measuring their performance on held-out test data using metrics such as accuracy and F1 score. However, this approach makes it difficult to identify specific model weak points. It also risks overestimating generalisable model performance due to increasingly well-evidenced systematic gaps and biases in hate speech datasets. To enable more targeted diagnostic insights, we introduce HATECHECK, a suite of functional tests for hate speech detection models. We specify 29 model functionalities motivated by a review of previous research and a series of interviews with civil society stakeholders. We craft test cases for each functionality and validate their quality through a structured annotation process. To illustrate HATECHECK's utility, we test near-state-of-the-art transformer models as well as two popular commercial models, revealing critical model weaknesses.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

HateCheck: Functional Tests for Hate Speech Detection Models

Röttger¹,

Vidgen²,

Nguyen³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

102

View full text Add to dashboard Cite

show abstract

“…Visio-linguistic stress testing. There are a number of existing multimodal stress tests about correctly understanding implausible scenes [13], exploitation of language and vision priors [11,27], single word mismatches [64], hate speech detection [26,32,41,92], memes [39,75], ablation of one modality to probe the other [22], distracting models with visual similarity between images [7,33], distracting models with textual similarity between many suitable captions [1,17], collecting more diverse image-caption pairs beyond the predominately English and North American/Western European datasets [50], probing for an understanding of verb-argument relationships [30], counting [53], or specific model failure modes [65,69]. Many of these stress tests rely only on synthetically generated images, often with minimal visual differences, but no correspondingly minimal textual changes [80].…”

Section: Related Workmentioning

confidence: 99%

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Thrush¹,

Jiang²,

Bartolo³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We present a novel task and dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two images and two captions, the goal is to match them correctly-but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annotators and is labeled with a rich set of fine-grained tags to assist in analyzing model performance. We probe a diverse range of state-of-the-art vision and language models and find that, surprisingly, none of them do much better than chance. Evidently, these models are not as skilled at visio-linguistic compositional reasoning as we might have hoped. We perform an extensive analysis to obtain insights into how future work might try to mitigate these models' shortcomings. We aim for Winoground to serve as a useful evaluation set for advancing the state of the art and driving further progress in the field. The dataset is available at https://huggingface.co/datasets/facebook/winoground.

show abstract

“…An alternative is to replace some terms with placeholders, e.g. "that [IDENTITY] is a [SLUR]" or "I hate [IDENTITY]", to convey syntax and some semantics but avoiding actual hate towards a specific target group (Röttger et al, 2021;Kirk et al, 2021a Disclaim Clearly identify the content's origin and thereby disclaim it as an example. For example, political ads should be labelled as "political content" and a distinct visual style should be used.…”

Section: Presenting Textual Harms For Publicationmentioning

confidence: 99%

Handling and Presenting Harmful Text

Derczynski¹,

Kirk²,

Birhane³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Textual data can pose a risk of serious harm. These harms can be categorised along three axes: (1) the harm type (e.g. misinformation, hate speech or racial stereotypes) (2) whether it is elicited as a feature of the research design from directly studying harmful content (e.g. training a hate speech classifier or auditing unfiltered large-scale datasets) versus spuriously invoked from working on unrelated problems (e.g. language generation or part of speech tagging) but with datasets that nonetheless contain harmful content, and (3) who it affects, from the humans (mis)represented in the data to those handling or labelling the data to readers and reviewers of publications produced from the data. It is an unsolved problem in NLP as to how textual harms should be handled, presented, and discussed; but, stopping work on content which poses a risk of harm is untenable. Accordingly, we provide practical advice and introduce HARMCHECK, a resource for reflecting on research into textual harms. We hope our work encourages ethical, responsible, and respectful research in the NLP community.

show abstract

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

Cited by 8 publications

References 0 publications

HateCheck: Functional Tests for Hate Speech Detection Models

HateCheck: Functional Tests for Hate Speech Detection Models

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Handling and Presenting Harmful Text

Contact Info

Product

Resources

About