Social media networks such as Twitter are increasingly utilized to propagate hate speech while facilitating mass communication. Recent studies have highlighted a strong correlation between hate speech propagation and hate crimes such as xenophobic attacks. Due to the size of social media and the consequences of hate speech in society, it is essential to develop automated methods for hate speech detection in different social media platforms. Several studies have investigated the application of different machine learning algorithms for hate speech detection. However, the performance of these algorithms is generally hampered by inefficient sequence transduction. The Vanilla recurrent neural networks and recurrent neural networks with attention have been established as state-of-the-art methods for the assignments of sequence modeling and sequence transduction. Unfortunately, these methods suffer from intrinsic problems such as long-term dependency and lack of parallelization. In this study, we investigate a transformer-based method and tested it on a publicly available multiclass hate speech corpus containing 24783 labeled tweets. DistilBERT transformer method was compared against attention-based recurrent neural networks and other transformer baselines for hate speech detection in Twitter documents. The study results show that DistilBERT transformer outperformed the baseline algorithms while allowing parallelization.
Hate speech is an undesirable phenomenon with severe psychological and physical consequences. The emergence of mobile computing and Web 2.0 technologies has increasingly facilitated the spread of hate speech. The speed, accessibility and anonymity afforded by these tools present challenges in enforcing measures that minimise the spread of hate speech. The continued dissemination of hate speech online has triggered the development of various machine learning techniques for its automated detection. However, current approaches are inadequate because of further challenges such as the use of domain-specific language and language subtleties. Recent studies on automated hate speech detection have focused on the use of deep learning as a possible solution to these challenges. Although some studies have explored deep learning methods for hate speech detection, there are no studies that critically compare and evaluate their performance. This work investigates the use of deep learning algorithms as possible solutions to hate speech detection on Twitter. Three taxonomic classes of deep learning algorithms, namely, Traditional deep learning algorithms, Traditional algorithms with partial attention mechanism and Transformer models, which are entirely based on the attention mechanism, are evaluated for performance, using two publicly available corpora. One of the datasets contained 24 786 tweets annotated into three different classes, while the other dataset contained 2300 tweets annotated into two different classes. All tweets from the two datasets were first preprocessed to rid of them of characters and words deemed irrelevant to the classification decision, for instance, hashtags, stop words and punctuation marks. The preprocessed text was then transformed into feature vectors which were used as input for deep learning algorithms explored in this study. A series of experiments were performed to measure the performance of the deep learning algorithms in hate speech detection. The algorithms were tested on a wide spectrum of tweets containing different forms of hate speech. The efficacy of the deep learning algorithms was objectively evaluated using six state-of-the-art statistical evaluation metrics: precision, Fmeasure, recall, accuracy, Mathews correlation coefficient and area under the curve. The results from this study indicate that variations in parameters do not impact the efficacy of deep learning algorithms by the same proportions. The findings of this empirical study, therefore, provide deep-learning practitioners with a better understanding of the adaptation of robust deep-learning techniques for automated hate speech detection tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.