A Platform Agnostic Dual-Strand Hate Speech Detector

Meyer, Johannes Skjeggestad; Gambäck, Björn

doi:10.18653/v1/w19-3516

Cited by 13 publications

(17 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Three studies compared character-level, word-level, and hybrid (both character- and word-level) CNNs, but drew completely different conclusions. Park (2018) and Meyer & Gambäck (2019) found hybrid and character CNN to perform best respectively. Probably most surprisingly, Lee, Yoon & Jung (2018) observed that word and hybrid CNNs outperformed character CNN to similar extents, with all CNNs performing worse than character n-gram logistic regression.…”

Section: Obstacles To Generalisable Hate Speech Detectionmentioning

confidence: 99%

Towards generalisable hate speech detection: a review on obstacles and solutions

Yin

Zubiaga

2021

PeerJ Computer Science

116

View full text Add to dashboard Cite

Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation. With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest. However, it is only recently that it has been shown that existing models generalise poorly to unseen data. This survey paper attempts to summarise how generalisable existing hate speech detection models are and the reasons why hate speech models struggle to generalise, sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.

show abstract

Section: Obstacles To Generalisable Hate Speech Detectionmentioning

confidence: 99%

Towards generalisable hate speech detection: a review on obstacles and solutions

Yin

Zubiaga

2021

PeerJ Computer Science

116

View full text Add to dashboard Cite

show abstract

“…Their argument for adopting the traditional approach was to provide better explainability of the knowledge transfer between domains. Some other studies adopted several neural-based models, including convolutional neural networks (CNN) [75,141], long short-term memory (LSTM) [8,75,92,94,145], bidirectional LSTM (Bi-LSTM) [115], and gated recurrent unit (GRU) [27]. The most recent works focus more on investigating transferability or generalizability of stateof-the-art transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) [19,48,66,79,83,90,92,134] and its variant like RoBERTa [48] in the cross-domain abusive language detection task.…”

Section: Modelsmentioning

confidence: 99%

“…[134] Neural model This study proposed several LSTM-based models that only focuses on using text information (char n-grams and word embedding) representation for building platform-agnostic hate speech detector, but they did not conduct any cross or multidomain experiment to evaluate their model. [75] Transformer based Experimented with a BERT-based classifier and topic modeling approach, which show that removing domain-specific instances improve the model's out-domain performance [83] Neural based Proposed several representations including target, content, and linguistic behavior and used cross attention gate flow to refine these representations, providing better domain-transfer knowledge.…”

Section: Modelsmentioning

confidence: 99%

Towards multidomain and multilingual abusive language detection: a survey

Pamungkas

Basile

Patti

2021

Pers Ubiquit Comput

View full text Add to dashboard Cite

Abusive language is an important issue in online communication across different platforms and languages. Having a robust model to detect abusive instances automatically is a prominent challenge. Several studies have been proposed to deal with this vital issue by modeling this task in the cross-domain and cross-lingual setting. This paper outlines and describes the current state of this research direction, providing an overview of previous studies, including the available datasets and approaches employed in both cross-domain and cross-lingual settings. This study also outlines several challenges and open problems of this area, providing insights and a useful roadmap for future work.

show abstract

“…For the most part, word-level n-grams have been highly predictive, with other linguistic features such as part-of-speech tags (Xu et al, 2012;Davidson et al, 2017) and sentiment score (Van Hee et al, 2015;Davidson et al, 2017) providing slight improvements. Due to their ability to perform better in an online setting where spelling errors and adversarial behaviour are commonplace, character-level features have been endorsed , and also shown to often be superior to word-level information for this task (Meyer and Gambäck, 2019). Metadata about users have also been used as features: Waseem and Hovy (2016) claim gender information leads to improved performance, while Unsvåg and Gambäck (2018) report user-network data to be more important.…”

Section: Previous Workmentioning

confidence: 99%

Studying Generalisability across Abusive Language Detection Datasets

Swamy¹,

Jamatia²,

Gambäck³

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Self Cite

View full text Add to dashboard Cite

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

show abstract

A Platform Agnostic Dual-Strand Hate Speech Detector

Cited by 13 publications

References 25 publications

Towards generalisable hate speech detection: a review on obstacles and solutions

Towards generalisable hate speech detection: a review on obstacles and solutions

Towards multidomain and multilingual abusive language detection: a survey

Studying Generalisability across Abusive Language Detection Datasets

Contact Info

Product

Resources

About