Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic. Due to the massive rise of user-generated web content on social media, the amount of hate speech is also steadily increasing. Over the past years, interest in online hate speech detection and, particularly, the automation of this task has continuously grown, along with the societal impact of the phenomenon. This paper describes a hate speech dataset composed of thousands of sentences manually labelled as containing hate speech or not. The sentences have been extracted from Stormfront, a white supremacist forum. A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it. The paper also provides a thoughtful qualitative and quantitative study of the resulting dataset and several baseline experiments with different classification models. The dataset is publicly available. 2 The examples in this work may contain offensive language. They have been taken from actual web data and by no means reflect the authors' opinion.3 https://github.com/aitor-garcia-p/hate-speech-dataset 4 www.stormfront.org
With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches for Aspect Based Sentiment Analysis obtain good results for the domain/language they are trained on, but having manually labelled data for training supervised systems for all domains and languages is usually very costly and time consuming. In this work we describe W2VLDA, an almost unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classification, aspectterms/opinion-words separation and sentiment polarity classification for any given domain and language. We evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronic devices).
No abstract
Customer experiences, in the shape of online reviews, influence other customers and in general, contribute to build a perception of a destination. This work presents the conclusions of a survey to gather user text-based reviews about several categories of destination-related information (accommodation, restaurants, attractions and Points of Interest) from three well-known social media sources (Facebook, FourSquare and GooglePlaces) about eight worldwide destinations with a high overnight rate. Several hypotheses about the correlation between the language and sentiment features of the reviews have been validated over a large dataset of reviews. For example, the analysis detected that the highest number of reviews in a destination is written in the same official language spoken in that place. Furthermore, Dutch speaking people are more positive when writing a review. Finally, English, Italian and Spanish speakers seem to prefer FourSquare while German and French people are quite evenly distributed among FourSquare and GooglePlaces.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.