Abstract. The negative consequences of cyberbullying are becoming more alarming every day and technical solutions that allow for taking appropriate action by means of automated detection are still very limited. Up until now, studies on cyberbullying detection have focused on individual comments only, disregarding context such as users' characteristics and profile information. In this paper we show that taking user context into account improves the detection of cyberbullying. IntroductionMore and more teenagers in online communities are exposed to and harmed by cyberbullying. Studies 1 show that in Europe about 18% of the children have been involved in cyberbullying, leading to severe depressions and even suicide attempts. Cyberbullying is defined as an aggressive, intentional act carried out by a group or individual, using electronic forms of contact repeatedly or over time, against a victim who cannot easily defend him-or herself [1]. Besides social measures, technical solutions have to be found to deal with this social problem. At present social network platforms rely on users alerting network moderators who in turn may remove bullying comments. The potential for alerting moderators can be improved by automatically detecting such comments allowing a moderator to act faster. Studies on automatic cyberbullying detection are few and typically limited to the individual comments and do not take context into account [2][3]. In this study we show that taking user context, such as a user's comments history and user characteristics [4], into account can improve the performance of detection tools for cyberbullying incidents considerably. We approach cyberbullying detection as a supervised classification task for which we investigated three incremental feature sets. In the next sections the experimental setup and results will be described, followed by a discussion of related work and conclusions.1 EU COST Action IS0801on Cyberbullying (https://sites.google.com/site/costis0801/). 694M. Dadvar et al. Experiment CorpusYouTube is the world's largest user-generated content site and its broad scope in terms of audience, videos, and users' comments make it a platform that is eligible for bullying and therefore an appropriate platform for collecting datasets for cyberbullying studies. As no cyberbullying dataset was publicly available, we collected a dataset of comments on YouTube movies. To cover a variety of topics, we collected the comments from the top 3 videos in the different categories found in YouTube. For each comment the user id, its date and time were also stored. Only the users with public profiles (78%) were kept. The final dataset consists of 4626 comments from 3858 distinct users. The comments were manually labelled as bullying (9.7%) and non-bullying based on the definition of cyberbullying in this study (inter-annotator agreement 93%). For each user we collected the comment history, consisting of up to 6 months of comments, on average 54 comments per user. Feature Space DesignThe following three feature sets were...
Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.
Work on expressive speech synthesis has long focused on the expression of basic emotions. In recent years, however, interest in other expressive styles has been increasing. The research presented in this paper aims at the generation of a storytelling speaking style, which is suitable for storytelling applications and more in general, for applications aimed at children. Based on an analysis of human storytellers' speech, we designed and implemented a set of prosodic rules for converting "neutral" speech, as produced by a text-to-speech system, into storytelling speech. An evaluation of our storytelling speech generation system showed encouraging results.
In this paper we describe the 2005 AMI system for the transcription of speech in meetings used for participation in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve very competitive performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.