Abstract. The negative consequences of cyberbullying are becoming more alarming every day and technical solutions that allow for taking appropriate action by means of automated detection are still very limited. Up until now, studies on cyberbullying detection have focused on individual comments only, disregarding context such as users' characteristics and profile information. In this paper we show that taking user context into account improves the detection of cyberbullying. IntroductionMore and more teenagers in online communities are exposed to and harmed by cyberbullying. Studies 1 show that in Europe about 18% of the children have been involved in cyberbullying, leading to severe depressions and even suicide attempts. Cyberbullying is defined as an aggressive, intentional act carried out by a group or individual, using electronic forms of contact repeatedly or over time, against a victim who cannot easily defend him-or herself [1]. Besides social measures, technical solutions have to be found to deal with this social problem. At present social network platforms rely on users alerting network moderators who in turn may remove bullying comments. The potential for alerting moderators can be improved by automatically detecting such comments allowing a moderator to act faster. Studies on automatic cyberbullying detection are few and typically limited to the individual comments and do not take context into account [2][3]. In this study we show that taking user context, such as a user's comments history and user characteristics [4], into account can improve the performance of detection tools for cyberbullying incidents considerably. We approach cyberbullying detection as a supervised classification task for which we investigated three incremental feature sets. In the next sections the experimental setup and results will be described, followed by a discussion of related work and conclusions.1 EU COST Action IS0801on Cyberbullying (https://sites.google.com/site/costis0801/). 694M. Dadvar et al. Experiment CorpusYouTube is the world's largest user-generated content site and its broad scope in terms of audience, videos, and users' comments make it a platform that is eligible for bullying and therefore an appropriate platform for collecting datasets for cyberbullying studies. As no cyberbullying dataset was publicly available, we collected a dataset of comments on YouTube movies. To cover a variety of topics, we collected the comments from the top 3 videos in the different categories found in YouTube. For each comment the user id, its date and time were also stored. Only the users with public profiles (78%) were kept. The final dataset consists of 4626 comments from 3858 distinct users. The comments were manually labelled as bullying (9.7%) and non-bullying based on the definition of cyberbullying in this study (inter-annotator agreement 93%). For each user we collected the comment history, consisting of up to 6 months of comments, on average 54 comments per user. Feature Space DesignThe following three feature sets were...
Abstract. Cyberbullying is becoming a major concern in online environments with troubling consequences. However, most of the technical studies have focused on the detection of cyberbullying through identifying harassing comments rather than preventing the incidents by detecting the bullies. In this work we study the automatic detection of bully users on YouTube. We compare three types of automatic detection: an expert system, supervised machine learning models, and a hybrid type combining the two. All these systems assign a score indicating the level of "bulliness" of online bullies. We demonstrate that the expert system outperforms the machine learning models. The hybrid classifier shows an even better performance. IntroductionWith the growth of the use of Internet as a social medium, a new form of bullying has emerged, called cyberbullying. Cyberbullying is defined as an aggressive, intentional act carried out by a group or individual, using electronic forms of contact repeatedly and over time against a victim who cannot easily defend him or herself [1]. One of the most common forms is the posting of hateful comments about someone in social networks. Many social studies have been conducted to provide support and training for adults and teenagers [2,3]. The majority of the existing technical studies on cyberbullying have concentrated on the detection of bullying or harassing comments [4-6], while there is hardly work on the more challenging task of detecting cyberbullies and studies for this area of research are largely missing. There are few exceptions however, that point out an interesting direction for the incorporation of user information in detecting offensive contents, but more advanced user information or personal characteristics such as writing style or possible network activities has not been included in these studies [7,8]. Cyberbullying prevention based on user profiles was addressed for the first time in our latest study in which an expert system was developed that assigns scores to social network users to indicate their level of 'bulliness' and their potential for future misbehaviour based on the history of their activities [9]. In the previous work we did not investigate machine learning models. In this study we focus again on the detection of bully users in online social networks but now we look into the efficiency of both expert systems and machine learning models for identifying the potential bully users. We compare the performance of both systems for the task of assigning a score to social network users that indicates their level of bulliness. We demonstrate that the expert system outperforms the machine learner and can be effectively combined in a hybrid classifier. The approach we propose can be used for building monitoring tools to stop potential bullies from conducting further harm. Data Collection and Feature SelectionIn this section we will explain the characteristics of the corpus used in this study. We also describe the feature space and the three feature categories that have been used...
Cyberbullying is a disturbing online misbehaviour with troubling consequences. It appears in different forms, and in most of the social networks, it is in textual format. Automatic detection of such incidents requires intelligent systems. Most of the existing studies have approached this problem with conventional machine learning models and the majority of the developed models in these studies are adaptable to a single social network at a time. In recent studies, deep learning based models have found their way in the detection of cyberbullying incidents, claiming that they can overcome the limitations of the conventional models, and improve the detection performance. In this paper, we investigate the findings of a recent literature in this regard. We successfully reproduced the findings of this literature and validated their findings using the same datasets, namely Wikipedia, Twitter, and Formspring, used by the authors. Then we expanded our work by applying the developed methods on a new YouTube dataset (~54k posts by ~4k users) and investigated the performance of the models in new social media platforms. We also transferred and evaluated the performance of the models trained on one platform to another platform. Our findings show that the deep learning based models outperform the machine learning models previously applied to the same YouTube dataset. We believe that the deep learning based models can also benefit from integrating other sources of information and looking into the impact of profile information of the users in social networks.
As a result of the invention of social networks friendships, relationships and social communications have all gone to a new level with new definitions. One may have hundreds of friends without even seeing their faces. Meanwhile, alongside this transition there is increasing evidence that online social applications have been used by children and adolescents for bullying. State-of-the-art studies in cyberbullying detection have mainly focused on the content of the conversations while largely ignoring the users involved in cyberbullying. We propose that incorporation of the users' information, their characteristics, and post-harassing behaviour, for instance, posting a new status in another social network as a reaction to their bullying experience, will improve the accuracy of cyberbullying detection. Crosssystem analyses of the users' behaviour -monitoring their reactions in different online environments -can facilitate this process and provide information that could lead to more accurate detection of cyberbullying.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.