Over the past decade, digital communication has reached a massive scale globally. Unfortunately, cyberbullying has become prevalent, with perpetrators hiding behind the mask of relative internet anonymity. In this work, efforts were made to review prominent classification algorithms and also to propose an ensemble model for identifying cases of cyberbullying, using Twitter datasets. The algorithms used for evaluation are Naive Bayes, K-Nearest Neighbors, Logistic Regression, Decision Tree, Random Forest, Linear Support Vector Classifier, Adaptive Boosting, Stochastic Gradient Descent and Bagging classifiers. Through experimentations, comparisons were made with the classifiers against four metrics: accuracy, precision, recall and F1 score. The results reveal the performances of all the algorithms used with their corresponding metrics. The ensemble model generated better results while Linear Support Vector Classifier (SVC) was the least effective of all. Random Forest classifier has shown to be the best performing classifier with medians of 0.77, 0.73 and 0.94 across the datasets. The ensemble model has shown to improve the results of its constituent classifiers with medians of 0.77, 0.66 and 0.94, as against the 0.59, 0.42 and 0.86 of Linear Support Vector Classifier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.