Towards generalisable hate speech detection: a review on obstacles and solutions

Yin, Wenjie; Zubiaga, Arkaitz

doi:10.48550/arxiv.2102.08886

Cited by 6 publications

(4 citation statements)

References 85 publications

(148 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the goal is to create a model that performs as well as possible on one dataset, then the traditional approach is appropriate. On the other hand, if we want to create a model that will generalise across time and topic, we believe it would be sensible for researchers to introduce domain specific knowledge and to also use an alternative test-set, as has been done in other fields (Yin and Zubiaga, 2021). Whilst machine learning can deliver impressive results, there is value in understanding relevant theory, as shown in our results.…”

Section: Discussionmentioning

confidence: 78%

Style over substance: A psychologically informed approach to feature selection and generalisability for author classification

Holmes

Cribbin

Ferenczi

2023

Computers in Human Behavior Reports

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 78%

Style over substance: A psychologically informed approach to feature selection and generalisability for author classification

Holmes

Cribbin

Ferenczi

2023

Computers in Human Behavior Reports

View full text Add to dashboard Cite

“…Debias Hate Speech Detection Recent works (Yin and Zubiaga 2021;Wiegand et al 2019;Kennedy et al, 2020;Ma et al, 2020;Gehman et al, 2020;Dreier et al, 2022;Stanovsky et al, 2019;Thakur et al, 2023;Ziems et al, 2023) have been studying the generalizability and biases for hate speech detection (Talat et al, 2018;AlKhamissi et al, 2022;Röttger et al, 2022;Bianchi et al, 2022). For instance, prior work found that existing hate speech detection models are biased against African American Vernacular English Speakers (Harris et al, 2022b;Sap et al, 2019) and certain identity words are highly correlated with these hateful labels (Bender et al, 2021;ElSherief et al, 2021a).…”

Section: Related Workmentioning

confidence: 99%

Mitigating Biases in Hate Speech Detection from A Causal Perspective

Zhang,

Chen,

Yang

2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Warning: This paper discusses and contains offensive or upsetting content. Nowadays, many hate speech detectors are built to automatically detect hateful content. However, their training sets are sometimes skewed towards certain stereotypes (e.g., race or religion-related). As a result, the detectors are prone to depend on some shortcuts for predictions. Previous works mainly focus on token-level analysis and heavily rely on human experts' annotations to identify spurious correlations, which is not only costly but also incapable of discovering higherlevel artifacts. In this work, we use grammar induction to find grammar patterns for hate speech and analyze this phenomenon from a causal perspective. Concretely, we categorize and verify different biases based on their spuriousness and influence on the model prediction. Then, we propose two mitigation approaches including Multi-Task Intervention and Data-Specific Intervention based on these confounders. Experiments conducted on 9 hate speech datasets demonstrate the effectiveness of our approaches. The code is available at https://github. com/SALT-NLP/Bias_Hate_Causal.

show abstract

“…In the context of automated detection studies, offensive and abusive language are both used as overarching words for harmful content. Offensive language has a broader reach, and hope speech falls under each of these categories (Hande et al, 2021b;Yin and Zubiaga, 2021). The strong relationship between hate speech and actual hate crimes highlight the significance of identifying and moderating hate speech.…”

Section: Related Workmentioning

confidence: 99%

Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling

Hande¹,

Puranik²,

Yasaswini³

et al. 2021

Preprint

View full text Add to dashboard Cite

Social media has effectively become the prime hub of communication and digital marketing. As these platforms enable the free manifestation of thoughts and facts in text, images and video, there is an extensive need to screen them to protect individuals and groups from offensive content targeted at them. Our work intends to classify code-mixed social media comments/posts in the Dravidian languages of Tamil, Kannada, and Malayalam. We intend to improve offensive language identification by generating pseudo-labels on the dataset. A custom dataset is constructed by transliterating all the code-mixed texts into the respective Dravidian language, either Kannada, Malayalam, or Tamil and then generating pseudo-labels for the transliterated dataset. The two datasets are combined using the generated pseudo-labels to create a custom dataset called CM-TRA. As Dravidian languages are under-resourced, our approach increases the amount of training data for the language models. We fine-tune several recent pretrained language models on the newly constructed dataset. We extract the pretrained language embeddings and pass them onto recurrent neural networks. We observe that fine-tuning ULMFiT on the custom dataset yields the best results on the code-mixed test sets of all three languages. Our approach yields the best results among the benchmarked models on Tamil-English, achieving a weighted F1-Score of 0.7934 while scoring competitive weighted F1-Scores of 0.9624 and 0.7306 on the code-mixed test sets of Malayalam-English and Kannada-English, respectively. The data and codes for the approaches discussed in our work have been released 1 .1 https://github.com/adeepH/Dravidian-OLI *

show abstract

Towards generalisable hate speech detection: a review on obstacles and solutions

Cited by 6 publications

References 85 publications

Style over substance: A psychologically informed approach to feature selection and generalisability for author classification

Style over substance: A psychologically informed approach to feature selection and generalisability for author classification

Mitigating Biases in Hate Speech Detection from A Causal Perspective

Offensive Language Identification in Low-resourced Code-mixed Dravidian languages using Pseudo-labeling

Contact Info

Product

Resources

About