Exponential growths of social media and microblogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express antisocial be havior like online harassment, cyberbullying, and hate speech. Nu merous works have been proposed to utilize these data for social and antisocial behavior analysis, by predicting the contexts mostly for highlyresourced languages like English. However, some lan guages are underresourced, e.g., South Asian languages like Ben gali, that lack of computational resources for natural language pro cessing (NLP). In this paper 1 , we propose an explainable approach for hate speech detection from underresourced Bengali language, which we called DeepHateExplainer. In our approach, Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates, by employing neural ensemble of different transformerbased neural architectures (i.e., monolingual Bangla BERTbase, multilingual BERTcased and un cased, and XLMRoBERTa), followed by identifying important terms with sensitivity analysis and layerwise relevance propagation (LRP) to provide humaninterpretable explanations. Evaluations against several machine learning (linear and treebased models) and deep neural networks (i.e., CNN, BiLSTM, and ConvLSTM with word embeddings) baselines yield F1 scores of 84%, 90%, 88%, and 88%, for political, personal, geopolitical, and religious hates, respectively, during 3fold crossvalidation tests.
Numerous works have been proposed to employ machine learning (ML) and deep learning (DL) techniques to utilize textual data from social media for anti-social behavior analysis such as cyberbullying, fake news propagation, and hate speech mainly for highlyresourced languages like English. However, despite of having lot of diversity and millions of native speakers, some languages such as Bengali are under-resourced, which is due to lack of computational resources for natural language processing (NLP). Like English, Bengali social media content also include images along with texts (e.g., multimodal contents are posted by embedding short texts into images on Facebook), only the textual data is not enough to judge them (e.g., to determine they are hate speech). In those cases, images might give extra context to make a proper judgement. This paper is about hate speech detection from multimodal Bengali memes and texts. We prepared the only multimodal hate speech detection dataset 1 for-a-kind of problem for Bengali. We train several neural architectures (i.e., neural networks like Bi-LSTM/Conv-LSTM with word embeddings, EfficientNet + transformer architectures such as monolingual Bangla BERT, multilingual BERT-cased/uncased, and XLM-RoBERTa) that jointly analyze textual and visual information for hate speech detection. The Conv-LSTM and XLM-RoBERTa models performed best for texts, yielding F1 scores of 0.78 and 0.82, respectively. As of memes, ResNet152 and DenseNet201 models yield F1 scores of 0.78 and 0.7, respectively. The multimodal fusion of mBERT-uncased + EfficientNet-B1 performed the best, yielding an F1 score of 0.80. Our study suggests that memes are moderately useful for hate speech detection in Bengali, but none of multimodal models outperform unimodal models analyzing only textual data. Further, to foster reproducible research, we plan to make available datasets, source codes, models, and notebooks 2 .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.