Detecting harmful content or hate speech on social media is a significant challenge due to the high throughput and large volume of content production on these platforms. Identifying hate speech in a timely manner is crucial in preventing its dissemination. We propose a novel stacked ensemble approach for detecting hate speech in English tweets. The proposed architecture employs an ensemble of three classifiers, namely support vector machine (SVM), logistic regression (LR), and XGBoost classifier (XGB), trained using word2vec and universal encoding features. The meta classifier, LR, combines the outputs of the three base classifiers and the features employed by the base classifiers to produce the final output. It is shown that the proposed architecture improves the performance of the widely used single classifiers as well as the standard stacking and classifier ensemble using majority voting. We also present results on the use of various combinations of machine learning classifiers as base classifiers. The experimental results from the proposed architecture indicated an improvement in the performance on all four datasets compared with the standard stacking, base classifiers, and majority voting. Furthermore, on three of these datasets, the proposed architecture outperformed all state-of-the-art systems.
Social media sites, which became central to our everyday lives, enable users to freely express their opinions, feelings, and ideas due to a certain level of depersonalization and anonymity they provide. If there is no control, these platforms may be used to propagate hate speech. In fact, in recent years, hate speech has increased on social media. Therefore, there is a need to monitor and prevent hate speech on these platforms. However, manual control is not feasible due to the high traffic of content production on social media sites. Moreover, the language used and the length of the messages provide a challenge when using classical machine learning approaches as prediction methods. This paper presents a genetic programming (GP) model for detecting hate speech where each chromosome represents a classifier employing a universal sentence encoder as a feature. A novel mutation technique that affects only the feature values in combination with the standard one-point mutation technique improves the performance of the GP model by enriching the offspring pool with alternative solutions. The proposed GP model outperforms all state-of-the-art systems for the four publicly available hate speech datasets.INDEX TERMS Classification algorithms, genetic programming, machine learning, prediction methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.