Homophobic expressions are a form of insulting the sexual orientation or personality of people. Severe psychological traumas may occur in people who are exposed to this type of communication. It is important to develop automatic classification systems based on language models to examine social media content and distinguish homophobic discourse. This study aims to present a pre-trained Multilingual Bidirectional Encoder Representations from Transformers (M-BERT) model that can successfully detect whether Turkish comments on social media contain homophobic or related hate comments (i.e., sexist, severe humiliation, and defecation expressions). Comments in the Homophobic-Abusive Turkish Comments (HATC) dataset were collected from Instagram to train the detection models. The HATC dataset was manually labeled at the sentence level and combined with the Abusive Turkish Comments (ATC) dataset that has developed in our previous study. The HATC dataset has been balanced using the resampling method and two forms of the dataset (i.e., resHATC and original HATC) were used in the experiments. Afterward, the M-BERT model was compared with DL-based models (i.e., Long-Short Term Memory, Bidirectional Long-Short Term Memory (BiLSTM), Gated Recurrent Unit), Traditional Machine Learning (TML) classifiers (i.e., Support Vector Machine, Naive Bayes, Random Forest) and Ensemble Classifiers (i.e., Adaptive Boosting, eXtreme Gradient Boosting, Gradient Boosting) for the best model selection. The performance of the detection models was evaluated using F1-score, precision, and recall performance metrics. Results showed the best performance (homophobic F1-score: 82.64%, hateful F1-score: 91.75%, neutral F1-score: 96.08%, average F1-score: 90.15%) was achieved with the M-BERT model on the HATC dataset. The M-BERT detection model can increase the effectiveness of filters in detecting Turkish homophobic and related hate speech in social networks. It can be used to detect homophobic and related hate speech for different languages since the M-BERT model has multilingual pre-trained data.
Social Media is one of the most frequently used platforms today. Users can easily share their views, ideas, and thoughts on this platform. The data shared on social media platforms is actually a great deal that can be transformed into meaningful information. The obtained big data can be analyzed and evaluated by various data analysis methods. Whether or not the data contain a feeling, if it is included; the type of the feeling (i.e. positive, negative or neutral) can be determined by emotion analysis methods. Sentiment Analysis studies in later times began to turn to analysis indicating different sentiments. Thus the foundations of Opinion Mining were laid. When ideas conveyed by social media information are presented semantically, they are expressed by Opinion Mining. The purpose of this paper is to explain the relationship between the concepts of Sentiment Analysis and Opinion Mining. The terms used in Sentiment Analysis and Opinion Mining are explained and examples of Turkish Sentiment Analysis are given. It has been tried to suggest solutions for the problems encountered in Turkish studies.
First seen in Wuhan, China, the coronavirus disease (COVID-19) became a worldwide epidemic. Turkey’s first reported case was announced on March 11, 2020—the day the World Health Organization declared COVID-19 is a pandemic. Due to the intense and widespread use of social media during the pandemic, determining the role and effect (i.e., positive, negative, neutral) of social media gives us important information about society's perspective on events. In our study, two datasets (i.e. Dataset1, Dataset2) consisting of Instagram comments on COVID-19 were composed between different dates of the pandemic, and the change between users' feelings and thoughts about the epidemic was analyzed. The datasets are the first publicly available Turkish datasets on the sentiment analysis of COVID-19, as far as we know. The sentiment analysis of Turkish Instagram comments was performed using Machine Learning models (i.e., Traditional Machine Learning, Deep Learning, and BERT-based Transfer Learning). In the experiments, the balanced versions of these datasets (i.e. resDataset1, resDataset2) were taken into account as well as the original ones. The BERT-based Transfer Learning model achieved the highest classification success with 0.7864 macro-averaged F1 score values in resDataset1 and 0.7120 in resDataset2. It has been proven that the use of a pre-trained language model in Turkish datasets is more successful than other models in terms of classification performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.