A systematic review of hate speech automatic detection using natural language processing

Jahan, Saroar; Oussalah, Mourad

doi:10.1016/j.neucom.2023.126232

Cited by 91 publications

(22 citation statements)

References 122 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In 2019, an NLP group from Turku University published FinBERT, a BERT-based pretrain language model for the Finnish language [29]. The FinBERT model is reported to have better performance than other popular models, including multilingual BERT, convolutional neural networks, and long short-term memory [30,31].…”

Section: Text Classificationmentioning

confidence: 99%

Exploring Political Mistrust in Pandemic Risk Communication: Mixed-Method Study Using Social Media Data Analysis

Unlu,

Truong,

Tammi

et al. 2023

J Med Internet Res

View full text Add to dashboard Cite

Background This research extends prior studies by the Finnish Institute for Health and Welfare on pandemic-related risk perception, concentrating on the role of trust in health authorities and its impact on public health outcomes. Objective The paper aims to investigate variations in trust levels over time and across social media platforms, as well as to further explore 12 subcategories of political mistrust. It seeks to understand the dynamics of political trust, including mistrust accumulation, fluctuations over time, and changes in topic relevance. Additionally, the study aims to compare qualitative research findings with those obtained through computational methods. Methods Data were gathered from a large-scale data set consisting of 13,629 Twitter and Facebook posts from 2020 to 2023 related to COVID-19. For analysis, a fine-tuned FinBERT model with an 80% accuracy rate was used for predicting political mistrust. The BERTopic model was also used for superior topic modeling performance. Results Our preliminary analysis identifies 43 mistrust-related topics categorized into 9 major themes. The most salient topics include COVID-19 mortality, coping strategies, polymerase chain reaction testing, and vaccine efficacy. Discourse related to mistrust in authority is associated with perceptions of disease severity, willingness to adopt health measures, and information-seeking behavior. Our findings highlight that the distinct user engagement mechanisms and platform features of Facebook and Twitter contributed to varying patterns of mistrust and susceptibility to misinformation during the pandemic. Conclusions The study highlights the effectiveness of computational methods like natural language processing in managing large-scale engagement and misinformation. It underscores the critical role of trust in health authorities for effective risk communication and public compliance. The findings also emphasize the necessity for transparent communication from authorities, concluding that a holistic approach to public health communication is integral for managing health crises effectively.

show abstract

Section: Text Classificationmentioning

confidence: 99%

Exploring Political Mistrust in Pandemic Risk Communication: Mixed-Method Study Using Social Media Data Analysis

Unlu,

Truong,

Tammi

et al. 2023

J Med Internet Res

View full text Add to dashboard Cite

show abstract

“…The propagation of hate speech online continuously challenges policy-makers and the research community due to difficulties limiting the evolving cyberspace, the need to empower individuals to express their opinions, and the delay of manual checking (Jahan and Oussalah, 2023 ).…”

Section: Introductionmentioning

confidence: 99%

“…To reduce its risks and possible devastating effects on the lives of individuals, families, and communities, the NLP community has shown an increasing interest in developing tools that help in the automatic detection of hate speech on social media platforms (Husain and Uzuner, 2021 ) as the detection of hate speech can be, generally, modeled as a supervised learning problem (Schmidt and Wiegand, 2017 ). Several studies investigated the problem and contrasted various processing pipelines using various sets of features and classification algorithms [e.g., Naive Bayes, Support Vector Machine (SVM), deep learning architectures, and so on] (Jahan and Oussalah, 2023 ).…”

Section: Introductionmentioning

confidence: 99%

“…Fairly generic features, such as a bag of words or embeddings, resulted in reasonable classification performance, and character-level schemes outperformed token-level approaches (Schmidt and Wiegand, 2017 ). It is reported in the literature that even though information derived from text can be useful for detecting hate speech, it may be beneficial to use some meta-information or information from other media types (e.g., images attached to messages) (Jahan and Oussalah, 2023 ).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hate speech detection in the Arabic language: corpus design, construction, and evaluation

Ahmad,

Azzeh,

Alnagi

et al. 2024

Front. Artif. Intell.

View full text Add to dashboard Cite

Hate Speech Detection in Arabic presents a multifaceted challenge due to the broad and diverse linguistic terrain. With its multiple dialects and rich cultural subtleties, Arabic requires particular measures to address hate speech online successfully. To address this issue, academics and developers have used natural language processing (NLP) methods and machine learning algorithms adapted to the complexities of Arabic text. However, many proposed methods were hampered by a lack of a comprehensive dataset/corpus of Arabic hate speech. In this research, we propose a novel multi-class public Arabic dataset comprised of 403,688 annotated tweets categorized as extremely positive, positive, neutral, or negative based on the presence of hate speech. Using our developed dataset, we additionally characterize the performance of multiple machine learning models for Hate speech identification in Arabic Jordanian dialect tweets. Specifically, the Word2Vec, TF-IDF, and AraBert text representation models have been applied to produce word vectors. With the help of these models, we can provide classification models with vectors representing text. After that, seven machine learning classifiers have been evaluated: Support Vector Machine (SVM), Logistic Regression (LR), Naive Bays (NB), Random Forest (RF), AdaBoost (Ada), XGBoost (XGB), and CatBoost (CatB). In light of this, the experimental evaluation revealed that, in this challenging and unstructured setting, our gathered and annotated datasets were rather efficient and generated encouraging assessment outcomes. This will enable academics to delve further into this crucial field of study.

show abstract

“…Despite the significant advances in DL that have been made in recent years in a wide range of fields, including computer vision (CV) (semantic segmentation [20][21][22][23][24], scene understanding [25][26][27][28][29], pose estimation [30][31][32][33][34][35][36], action [36][37][38] or gesture [39][40][41][42][43] classification, face [44][45][46][47] or emotion [48][49][50][51] recognition, etc. ), natural language processing (text analysis [52][53][54], language translation [55][56][57], sentiment analysis [58][59][60], question answering [61], etc. ), speech recognition [62][63][64][65], and generative design (automated content generation…”

Section: Introductionmentioning

confidence: 99%

A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition

2023

View full text Add to dashboard Cite

This article provides a detailed review of recent advances in audio-visual speech recognition (AVSR) methods that have been developed over the last decade (2013–2023). Despite the recent success of audio speech recognition systems, the problem of audio-visual (AV) speech decoding remains challenging. In comparison to the previous surveys, we mainly focus on the important progress brought with the introduction of deep learning (DL) to the field and skip the description of long-known traditional “hand-crafted” methods. In addition, we also discuss the recent application of DL toward AV speech fusion and recognition. We first discuss the main AV datasets used in the literature for AVSR experiments since we consider it a data-driven machine learning (ML) task. We then consider the methodology used for visual speech recognition (VSR). Subsequently, we also consider recent AV methodology advances. We then separately discuss the evolution of the core AVSR methods, pre-processing and augmentation techniques, and modality fusion strategies. We conclude the article with a discussion on the current state of AVSR and provide our vision for future research.

show abstract

A systematic review of hate speech automatic detection using natural language processing

Cited by 91 publications

References 122 publications

Exploring Political Mistrust in Pandemic Risk Communication: Mixed-Method Study Using Social Media Data Analysis

Exploring Political Mistrust in Pandemic Risk Communication: Mixed-Method Study Using Social Media Data Analysis

Hate speech detection in the Arabic language: corpus design, construction, and evaluation

A Review of Recent Advances on Deep Learning Methods for Audio-Visual Speech Recognition

Contact Info

Product

Resources

About