Racial Bias in Hate Speech and Abusive Language Detection Datasets

Davidson, Thomas; Bhattacharya, Debasmita; Weber, Ingmar

doi:10.18653/v1/w19-3504

Cited by 326 publications

(299 citation statements)

References 27 publications

Supporting

Mentioning

289

Contrasting

Unclassified

Order By: Relevance

“…Furthermore, researchers have recently focused on the bias derived from the hate speech training datasets [2,21,24]. Davidson et al [2] showed that there were systematic and substantial racial biases in five benchmark Twitter datasets annotated for offensive language detection. Wiegand et al [24] also found that classifiers trained on datasets containing more implicit abuse (tweets with some abusive words) are more affected by biases rather than once trained on datasets with a high proportion of explicit abuse samples (tweets containing sarcasm, jokes, etc.).…”

Section: Previous Workmentioning

confidence: 99%

“…Hate speech is commonly defined as any communication criticizing a person or a group based on some characteristics such as gender, sexual orientation, nationality, religion, race, etc. Hate speech detection is not a stable or simple target because misclassification of regular conversation as hate speech can severely affect users freedom of expression and reputation, while misclassification of hateful conversations as unproblematic would maintain the status of online communities as unsafe environments [2].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Mozafari

Farahbakhsh

Crespi

2019

Studies in Computational Intelligence

263

173

View full text Add to dashboard Cite

Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an efficient automatic hate speech detection model based on advanced machine learning and natural language processing, but also a sufficiently large amount of annotated data to train a model. The lack of a sufficient amount of labelled hate speech data, along with the existing biases, has been the main issue in this domain of research. To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers). More specifically, we investigate the ability of BERT at capturing hateful context within social media content by using new finetuning methods based on transfer learning. To evaluate our proposed approach, we use two publicly available datasets that have been annotated for racism, sexism, hate, or offensive content on Twitter. The results show that our solution obtains considerable performance on these datasets in terms of precision and recall in comparison to existing approaches. Consequently, our model can capture some biases in data annotation and collection process and can potentially lead us to a more accurate model.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Mozafari

Farahbakhsh

Crespi

2019

Studies in Computational Intelligence

263

173

View full text Add to dashboard Cite

show abstract

“…Membership Query Synthesis might also be an interesting approach for tasks where the automatic extraction of large amounts of unlabelled data is not straight-forward. One example that comes to mind is the detection of offensive language or 'hate speech', where we have to deal with highly unbalanced training sets with only a small number of positive instances, and attempts to increase this number have been shown to result in systematically biased datasets (Davidson et al, 2019;Wiegand et al, 2019). Table 2 suggests that the generator produces instances with a more balanced class ratio (1.7 and 1.2) than the pool data (2.6) it was trained on.…”

Section: Discussionmentioning

confidence: 99%

Active Learning via Membership Query Synthesis for Semi-Supervised Sentence Classification

Schumann¹,

Rehbein²

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

Active learning (AL) is a technique for reducing manual annotation effort during the annotation of training data for machine learning classifiers. For NLP tasks, pool-based and stream-based sampling techniques have been used to select new instances for AL while generating new, artificial instances via Membership Query Synthesis was, up to know, considered to be infeasible for NLP problems. We present the first successful attempt to use Membership Query Synthesis for generating AL queries for natural language processing, using Variational Autoencoders for query generation. We evaluate our approach in a text classification task and demonstrate that query synthesis shows competitive performance to pool-based AL strategies while substantially reducing annotation time.

show abstract

“…2 Our experiments show improvement over their results, as shown in "Experimental design and evaluation" section. Other research articles providing source code for hate detection model development and/or evaluation with links to code implementations that we could locate from our literature review include (implementations in footnotes) Waseem and Hovy [65], 3 Davidson et al [66], 4 ElSherief et al [67], 5 Saha et al [68], 6 Qian et al [69], 7 Ross et al [70], 8 de Gibert et al…”

Section: Research Gapsmentioning

confidence: 99%

Developing an online hate classifier for multiple social media platforms

Salminen

Hopf²,

Chowdhury

et al. 2020

Hum. Cent. Comput. Inf. Sci.

237

View full text Add to dashboard Cite

been identified as a major threat on online social media platforms. Pew Research Center [13] reports that among 4248 adults in the United States, 41% have personally experienced harassing behavior online, whereas 66% witnessed harassment directed towards others. Around 22% of adults have experienced offensive name-calling, purposeful embarrassment (22%), physical threats (10%), and sexual harassment (6%), among other types of harassment. Social media platforms are the most prominent grounds for such toxic behavior. Even though they often provide ways of flagging offensive and hateful content, only 17% of all adults have flagged harassing conversation, whereas only 12% of adults have reported someone for such acts [13].

show abstract

Racial Bias in Hate Speech and Abusive Language Detection Datasets

Cited by 326 publications

References 27 publications

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Active Learning via Membership Query Synthesis for Semi-Supervised Sentence Classification

Developing an online hate classifier for multiple social media platforms

Contact Info

Product

Resources

About