The emergence of anti-social behaviour in online environments presents a serious issue in today’s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of anti-social behaviour from the pieces of text. In this work, we present a comparison of various deep learning models used to identify the toxic comments in the Internet discussions. Our main goal was to explore the effect of the data preparation on the model performance. As we worked with the assumption that the use of traditional pre-processing methods may lead to the loss of characteristic traits, specific for toxic content, we compared several popular deep learning and transformer language models. We aimed to analyze the influence of different pre-processing techniques and text representations including standard TF-IDF, pre-trained word embeddings and also explored currently popular transformer models. Experiments were performed on the dataset from the Kaggle Toxic Comment Classification competition, and the best performing model was compared with the similar approaches using standard metrics used in data analysis.
This paper connects two large research areas, namely sentiment analysis and human–robot interaction. Emotion analysis, as a subfield of sentiment analysis, explores text data and, based on the characteristics of the text and generally known emotional models, evaluates what emotion is presented in it. The analysis of emotions in the human–robot interaction aims to evaluate the emotional state of the human being and on this basis to decide how the robot should adapt its behavior to the human being. There are several approaches and algorithms to detect emotions in the text data. We decided to apply a combined method of dictionary approach with machine learning algorithms. As a result of the ambiguity and subjectivity of labeling emotions, it was possible to assign more than one emotion to a sentence; thus, we were dealing with a multi-label problem. Based on the overview of the problem, we performed experiments with the Naive Bayes, Support Vector Machine and Neural Network classifiers. Results obtained from classification were subsequently used in human–robot experiments. Despise the lower accuracy of emotion classification, we proved the importance of expressing emotion gestures based on the words we speak.
This work belongs to the field of sentiment analysis; in particular, to opinion and emotion classification using a lexicon-based approach. It solves several problems related to increasing the effectiveness of opinion classification. The first problem is related to lexicon labelling. Human labelling in the field of emotions is often too subjective and ambiguous, and so the possibility of replacement by automatic labelling is examined. This paper offers experimental results using a nature-inspired algorithm—particle swarm optimization—for labelling. This optimization method repeatedly labels all words in a lexicon and evaluates the effectiveness of opinion classification using the lexicon until the optimal labels for words in the lexicon are found. The second problem is that the opinion classification of texts which do not contain words from the lexicon cannot be successfully done using the lexicon-based approach. Therefore, an auxiliary approach, based on a machine learning method, is integrated into the method. This hybrid approach is able to classify more than 99% of texts and achieves better results than the original lexicon-based approach. The final hybrid model can be used for emotion analysis in human–robot interactions.
In this paper we present an improvement of the precision of classification algorithm results. Two various approaches are known: bagging and boosting. This paper describes a set of experiments with bagging and boosting methods. Our use of these methods aims at classification algorithms generating decision trees. Results of performance tests focused on the use of the bagging and boosting methods in connection with binary decision trees are presented. The minimum number of decision trees, which enables an improvement of the classification performed by the bagging and boosting methods, was found. The tests were carried out using the Reuter?s 21578 collection of documents as well as documents from an Internet portal of TV broadcasting company Mark?za. The comparison of our results on testing the bagging and boosting algorithms is presented.
The article focuses on solving an important problem of detecting suspicious reviewers in online discussions on social networks. We have concentrated on a special type of suspicious authors, on trolls. We have used methods of machine learning for generation of detection models to discriminate a troll reviewer from a common reviewer, but also methods of sentiment analysis to recognize the sentiment typical for troll’s comments. The sentiment analysis can be provided also using machine learning or lexicon-based approach. We have used lexicon-based sentiment analysis for its better ability to detect a dictionary typical for troll authors. We have achieved Accuracy = 0.95 and F1 = 0.80 using sentiment analysis. The best results using machine learning methods were achieved by support vector machine, Accuracy = 0.986 and F1 = 0.988, using a dataset with the set of all selected attributes. We can conclude that detection model based on machine learning is more successful than lexicon-based sentiment analysis, but the difference in accuracy is not so large as in F1 measure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.