Automated Hate Speech Detection and the Problem of Offensive Language

Davidson, Thomas; Warmsley, Dana; Macy, Michael W.; Weber, Ingmar

doi:10.48550/arxiv.1703.04009

Cited by 55 publications

(106 citation statements)

References 0 publications

Supporting

Mentioning

103

Contrasting

Unclassified

Order By: Relevance

“…Hate speech, defined as speech that targets social groups with the intent to cause harm, is arguably the most widely studied form of incivility detection, largely due to the practical need to moderate online discussions. Many Twitter datasets have been collected, of racist and sexist tweets (Waseem and Hovy, 2016), of hateful and offensive tweets 2 Available at https://github.com/ anushreehede/incivility_in_news (Davidson et al, 2017), and of hateful, abusive, and spam tweets (Founta et al, 2018). Another category of incivility detection that more closely aligns with our work is toxicity prediction.…”

Section: Datasets For Incivility Detectionsupporting

confidence: 67%

From Toxicity in Online Comments to Incivility in American News: Proceed with Caution

Hede¹,

Agarwal²,

Lu³

et al. 2021

Preprint

View full text Add to dashboard Cite

The ability to quantify incivility online, in news and in congressional debates, is of great interest to political scientists. Computational tools for detecting online incivility for English are now fairly accessible and potentially could be applied more broadly. We test the Jigsaw Perspective API for its ability to detect the degree of incivility on a corpus that we developed, consisting of manual annotations of civility in American news. We demonstrate that toxicity models, as exemplified by Perspective, are inadequate for the analysis of incivility in news. We carry out error analysis that points to the need to develop methods to remove spurious correlations between words often mentioned in the news, especially identity descriptors and incivility. Without such improvements, applying Perspective or similar models on news is likely to lead to wrong conclusions, that are not aligned with the human perception of incivility.

show abstract

Section: Datasets For Incivility Detectionsupporting

confidence: 67%

From Toxicity in Online Comments to Incivility in American News: Proceed with Caution

Hede¹,

Agarwal²,

Lu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We are most interested in responses generated by dialogue models in offensive contexts. However, offensive language is rare in a random sample (Davidson et al, 2017;Founta et al, 2018). Hence, we implement a two-stage sampling strategy: (1) Random sample -From both sources, randomly sample 500 threads (total 1000).…”

Section: Data Collectionmentioning

confidence: 99%

“…Identifying Toxicity -Most work on identifying toxic language looked at a individual social media posts or comments without taking context into account (Davidson et al, 2017;Xu et al, 2012;Zampieri et al, 2019;Rosenthal et al, 2020;Kumar et al, 2018;Garibo i Orts, 2019;Ousidhoum et al, 2019;Breitfeller et al, 2019;Hada et al, 2021;Barikeri et al, 2021)…”

Section: Related Workmentioning

confidence: 99%

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Baheti¹,

Sap²,

Ritter³

et al. 2021

Preprint

View full text Add to dashboard Cite

Dialogue models trained on human conversations inadvertently learn to generate offensive responses. Moreover, models can insult anyone by agreeing with an offensive context. To understand the dynamics of contextually offensive language, we study the stance of dialogue model responses in offensive Reddit conversations. Specifically, we crowd-annotate TOXI-CHAT, a new dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance. Our analysis reveals that 42% of user responses agree with toxic comments; 3× their agreement with safe comments (13%). Pre-trained transformer-based classifiers fine-tuned on our dataset achieve 0.71 F 1 for offensive labels and 0.53 Macro-F 1 for stance labels. Finally, we analyze some existing controllable text generation (CTG) methods to mitigate the contextual offensive behavior of dialogue models. Compared to the baseline, our best CTG model obtains a 19% reduction in agreement with offensive context and 29% fewer offensive responses. This highlights the need for future work to characterize and analyze more forms of inappropriate behavior in dialogue models to help make them safer. 1

show abstract

“…Detecting offensive and abusive content online is a critical step in mitigating the harms it causes to people (Waseem and Hovy, 2016;Davidson et al, 2017). Various online platforms have increasingly turned to NLP techniques to do this task at scale (e.g., the Perspective API).…”

Section: Introductionmentioning

confidence: 99%

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Ghosh

Baker

Jurgens

et al. 2021

Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-Nut 2021)

View full text Add to dashboard Cite

Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users. However, these techniques suffer from various sampling and association biases present in training data, often resulting in sub-par performance on content relevant to marginalized groups, potentially furthering disproportionate harms towards them. Studies on such biases so far have focused on only a handful of axes of disparities and subgroups that have annotations/lexicons available. Consequently, biases concerning non-Western contexts are largely ignored in the literature. In this paper, we introduce a weakly supervised method to robustly detect lexical biases in broader geocultural contexts. Through a case study on a publicly available toxicity detection model, we demonstrate that our method identifies salient groups of cross-geographic errors, and, in a follow up, demonstrate that these groupings reflect human judgments of offensive and inoffensive language in those geographic contexts. We also conduct analysis of a model trained on a dataset with ground truth labels to better understand these biases, and present preliminary mitigation experiments.

show abstract

Automated Hate Speech Detection and the Problem of Offensive Language

Cited by 55 publications

References 0 publications

From Toxicity in Online Comments to Incivility in American News: Proceed with Caution

From Toxicity in Online Comments to Incivility in American News: Proceed with Caution

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

Contact Info

Product

Resources

About