Automatic Hate Speech Detection: A Literature Review

Filtering out offensive messages with human supervision can be a tedious and cumbersome task. There is a strong incentive to develop automatic hate speech detection, and there are many studies that propose various approaches, from classic machine learning to deep learning classification techniques [1,3,7]. Most of these algorithms require human-annotated training examples written in the specific language of the analyzed messages, in order to classify offensive and non-offensive texts. Unfortunately, not all spoken languages have the same richness in available datasets, since most of the research has focused on the English, German, Italian, or Spanish languages. In contrast, at present, there is only one dataset available for offensive speech detection in Romanian [5].In this paper, we propose a novel Romanian language dataset for offensive and hate speech detection, News-RO-Offense, with 4052 records 1 . In addition, we present several approaches for the automatic detection of insults, racism, homophobia, and sexism using classical machine learning and deep learning models. RELATED WORKThere are several ways to classify offensive speech based on the type of message (e.g, insult, cyberbullying, sexism, racism, abuse), the perceived target of the message (e.g., misogyny, homophobic, antisemitic), or if the target is a person or group. Zampieri et al. [14] proposed a three-level classification for offensive messages: the first level differentiates between offensive versus non-offensive messages, then the second level distinguishes between targeted and untargeted profanities, whereas the targeted texts are labeled based on the target categories on the third level, namely: individual, group, or other. In contrast, Waseem et al. [13] make a distinction between generalized, directed, explicit, and implicit offenses.Even if the majority of studies focus on the English language, more and more do address other languages. Struß et al. [11] presented a classification based on the harshness of the offenses into PROFANITY, INSULT, and ABUSE classes; additionally, the authors considered the explicitness of the hate speech into implicit and explicit messages. For the Italian language, Sanguinetti et al. [9] annotated a Twitter-based corpus 1 https://github.com/readerbench/news-ro-offense

show abstract

“…

…”

mentioning

confidence: 99%