The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English

Ljubešić, Nikola; Fišer, Darja; Erjavec, Tomaž

doi:10.1007/978-3-030-27947-9_9

Cited by 35 publications

(34 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…FRENK (Ljubešić et al, 2019) The FRENK datasets consist of Facebook comments in English and Slovene covering LGBT and migrant topics. The datasets were manually annotated for finegrained types of socially unacceptable discourse (e.g., violence, offensiveness, threat).…”

Section: Datamentioning

confidence: 99%

Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate

Markov¹,

Daelemans²

2021

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

View full text Add to dashboard Cite

Hate speech detection is an actively growing field of research with a variety of recently proposed approaches that allowed to push the state-of-the-art results. One of the challenges of such automated approaches -namely recent deep learning models -is a risk of false positives (i.e., false accusations), which may lead to over-blocking or removal of harmless social media content in applications with little moderator intervention. We evaluate deep learning models both under in-domain and crossdomain hate speech detection conditions, and introduce an SVM approach that allows to significantly improve the state-of-the-art results when combined with the deep learning models through a simple majority-voting ensemble. The improvement is mainly due to a reduction of the false positive rate.

show abstract

Section: Datamentioning

confidence: 99%

Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate

Markov¹,

Daelemans²

2021

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

View full text Add to dashboard Cite

show abstract

“…The English language is well-resourced and researched [19,22,24]. Recently, hate speech detection studies appeared for Croatian [25,27,29] and Slovene [31,33,34].…”

Section: Hate Speech Detectionmentioning

confidence: 99%

“…The Slovene dataset was produced in the Slovenian national project FRENK 6 . The text dataset used in the experiment is a combination of two different studies of Facebook comments [33]. The first group of comments was collected on LGBT homophobia topics, while the second on antimigrants posts.…”

Section: Hate Speech Datasetsmentioning

confidence: 99%

To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

et al. 2021

View full text Add to dashboard Cite

Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, have achieved superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test whether affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it offers state-of-the-art classification performance and can detect less trusted predictions.

show abstract

“…The Dutch LiLaH corpus consists of approximately 36,000 Facebook comments on online news articles related to migrants or the LGBT community mined from three popular Flemish newspaper pages (HLN, Het Nieuwsblad and VRT) 2 . The corpus, which has been used in several recent studies on hate speech detection in Dutch, e.g., (Markov et al, 2021;Ljubešić et al, 2020), was annotated for the type and target of hateful comments following the same procedure and annotation guidelines as presented in (Ljubešić et al, 2019), that is, with respect to the type of hate speech, the possible classes were violent speech and offensive speech (either triggered by the target's personal background, e.g., religion, gender, sexual orientation, nationality, etc., or on the basis of individual characteristics), inappropriate speech (without a specific target), and appropriate speech. The targets, on the other hand, were divided into migrants and the LGBT community, people related to either of these communities (e.g., people who support them), the journalist who wrote or medium that provided the article, another commenter, other targets and no target.…”

Section: Corpus Descriptionmentioning

confidence: 99%

Improving Hate Speech Type and Target Detection with Hateful Metaphor Features

Lemmens¹,

Markov²,

Daelemans³

2021

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

View full text Add to dashboard Cite

We study the usefulness of hateful metaphors as features for the identification of the type and target of hate speech in Dutch Facebook comments. For this purpose, all hateful metaphors in the Dutch LiLaH corpus were annotated and interpreted in line with Conceptual Metaphor Theory and Critical Metaphor Analysis. We provide SVM and BERT/RoBERTa results, and investigate the effect of different metaphor information encoding methods on hate speech type and target detection accuracy. The results of the conducted experiments show that hateful metaphor features improve model performance for the both tasks. To our knowledge, it is the first time that the effectiveness of hateful metaphors as an information source for hate speech classification is investigated.

show abstract

The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English

Cited by 35 publications

References 7 publications

Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate

Improving Cross-Domain Hate Speech Detection by Reducing the False Positive Rate

To BAN or Not to BAN: Bayesian Attention Networks for Reliable Hate Speech Detection

Improving Hate Speech Type and Target Detection with Hateful Metaphor Features

Contact Info

Product

Resources

About