Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

Vidgen, Bertie; Thrush, Tristan; Waseem, Zeerak

doi:10.18653/v1/2021.acl-long.132

Cited by 63 publications

(70 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our second study focused on collecting ratings for a larger set of posts, but with fewer annota- tors per post to simulate a crowdsourced dataset on toxic language. Drawing from two existing toxic language detection corpora, we select posts that are automatically detected 5 as AAE and/or vulgar from Founta et al ( 2018), and posts that are automatically detected as vulgar and/or annotated as anti-Black from Vidgen et al (2021). 6 Importantly, in this study, we consider anti-Black or AAE posts that could also be vulgar, and allow this vulgarity to cover both potentially offensive identity references (OI) as well as non-identity vulgar words (ONI; see §2.3).…”

Section: Breadth-of-posts Studymentioning

confidence: 99%

“…Anti-Black language denotes racially prejudiced or racist content-subtle (Breitfeller et al, 2019) or overt-which is often a desired target for toxic language detection research (Waseem, 2016;Vidgen et al, 2021). Based on prior work on linking conservative ideologies, endorsement of unrestricted speech, and racial prejudice with reduced likelihood to accept the term "hate speech" (Duckitt and Fisher, 2003;White and Crandall, 2017;Roussos and Dovidio, 2018;Elers and Jayan, 2020), we hypothesize that conservative annotators and those who score highly on the RACISTBELIEFS or FREEOFFSPEECH scales will rate anti-Black tweets as less toxic, and vice-versa.…”

Section: Rated Asmentioning

confidence: 99%

“…the attitudes of annotators towards free speech, racism, and their beliefs on the harms of hate speech, for an accurate estimation of anti-Black language as toxic, offensive, or racist (e.g., by actively taking into consideration annotator ideologies; Waseem, 2016;Vidgen et al, 2021). This can be especially important given that hateful content very often targets marginalized groups and racial minorities (Silva et al, 2016;Sap et al, 2020), and can catalyze violence against them (O' Keeffe et al, 2011;Cleland, 2014).…”

Section: Rated As Racistmentioning

confidence: 99%

See 2 more Smart Citations

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

Sap¹,

Swayamdipta²,

Vianna³

et al. 2021

Preprint

View full text Add to dashboard Cite

Warning: this paper discusses and contains content that is offensive or upsetting.The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the who, why, and what behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annotator identities (who) and beliefs (why), drawing from social psychology research about hate speech, free speech, racist beliefs, political leaning, and more. We disentangle what is annotated as toxic by considering posts with three characteristics: anti-Black language, African American English (AAE) dialect, and vulgarity. Our results show strong associations between annotator identity and beliefs and their ratings of toxicity. Notably, more conservative annotators and those who scored highly on our scale for racist beliefs were less likely to rate anti-Black language as toxic, but more likely to rate AAE as toxic. We additionally present a case study illustrating how a popular toxicity detection system's ratings inherently reflect only specific beliefs and perspectives. Our findings call for contextualizing toxicity labels in social variables, which raises immense implications for toxic language annotation and detection.

show abstract

Section: Breadth-of-posts Studymentioning

confidence: 99%

Section: Rated Asmentioning

confidence: 99%

Section: Rated As Racistmentioning

confidence: 99%

See 1 more Smart Citation

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

Sap¹,

Swayamdipta²,

Vianna³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Overall, they are moderately aligned with the prescriptive paradigm. Vidgen et al (2021b) annotate hate speech. They provide annotators with fine-grained definitions for each category as well as very detailed annotation guidelines, and disagreements are resolved by an expert.…”

Section: A Overview Of Subjective Task Datasetsmentioning

confidence: 99%

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Röttger¹,

Vidgen²,

Hovy³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Labelled data is the foundation of most natural language processing tasks. However, labelling data is difficult and there often are diverse valid beliefs about what the correct data labels should be. So far, dataset creators have acknowledged annotator subjectivity, but not actively managed it in the annotation process. This has led to partly-subjective datasets that fail to serve a clear downstream use. To address this issue, we propose two contrasting paradigms for data annotation. The descriptive paradigm encourages annotator subjectivity, whereas the prescriptive paradigm discourages it. Descriptive annotation allows for the surveying and modelling of different beliefs, whereas prescriptive annotation enables the training of models that consistently apply one belief. We discuss benefits and challenges in implementing both paradigms, and argue that dataset creators should explicitly aim for one or the other to facilitate the intended use of their dataset. Lastly, we design an annotation experiment to illustrate the contrast between the two paradigms.

show abstract

“…In order to reduce this risk, one idea is to clean the harmful responses in the dataset, and the other is to detect the harmfulness of the results output by the model. Both of these can be achieved based on some recent works on offensive speech detection (Ranasinghe and Zampieri, 2020) or hate speech detection (Vidgen et al, 2021).…”

Section: Failure Modesmentioning

confidence: 99%

Retrieve, Discriminate and Rewrite: A Simple and Effective Framework for Obtaining Affective Response in Retrieval-Based Chatbots

Tian²,

Zhao

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Obtaining affective response is a key step in building empathetic dialogue systems. This task has been studied a lot in generation-based chatbots, but the related research in retrievalbased chatbots is still in the early stage. Existing works in retrieval-based chatbots are based on Retrieve-and-Rerank framework, which have a common problem of satisfying affect label at the expense of response quality. To address this problem, we propose a simple and effective Retrieve-Discriminate-Rewrite framework. The framework replaces the reranking mechanism with a new discriminate-andrewrite mechanism, which predicts the affect label of the retrieved high-quality response via discrimination module and further rewrites the affect unsatisfied response via rewriting module. This can not only guarantee the quality of the response, but also satisfy the given affect label. In addition, another challenge for this line of research is the lack of an off-theshelf affective response dataset. To address this problem and test our proposed framework, we annotate a Sentimental Douban Conversation Corpus based on the original Douban Conversation Corpus. Experimental results show that our proposed framework is effective and outperforms competitive baselines.

show abstract

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

Cited by 63 publications

References 47 publications

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

Retrieve, Discriminate and Rewrite: A Simple and Effective Framework for Obtaining Affective Response in Retrieval-Based Chatbots

Contact Info

Product

Resources

About