Studies on Natural Language Processing are mainly conducted in English, with very few exploring languages that are under-resourced, including the Dravidian languages. We present a novel work in detecting offensive language using a corpus collected from YouTube containing comments in Tamil. The study specifically aims to compare two machine learning approaches, namely, supervised, and unsupervised to detect offensive patterns in textual communications. In the first setup, offensive language detection models were developed using the traditional machine learning algorithms such as Random Forest, Logistic Regression, Support Vector Machine and AdaBoost, and assessed based on human labeling. Conversely, we used K-means (K = 2) to cluster the unlabeled data before training the same set of machine learning algorithms to detect offensive communications. Performance scores indicate unsupervised clustering to be more effective than human labeling with ensemble classifiers achieving an impressive accuracy of 99.70% and 99.87%, respectively for balanced and imbalanced datasets, hence showing that unsupervised approach can be used effectively to detect offensive language in low resourced languages.
With the advent of Web 2.0 technologies such as social media, online text sources provide large scale data repositories out of which valuable knowledge about human emotions can be derived. This paper aims to (i) detect and classify emotions of the Facebook diabetes community, (ii) examine the relationship of emotion and Facebook reactions, and (iii) identify user reaction predictors for each of the emotion. A total of 15K posts were randomly selected from several official Facebook diabetes support groups. Pre-processing was administered, resulting in 2475 Facebook posts for further analysis in this study. Emotion detection was first administered using Indico API, with results revealing anger, sadness and fear to be the top most emotions experienced, whilst love and wow emerged as the highest-ranking reactions. Precision and recall indicate the performance of the emotion detection mechanism ranged between 65 – 82% for all the emotions, compared to the human annotation. The average F-score recorded was 78%. Both love and wow were found to significantly predict joy and fear, whereas angry was found to predict anger. The findings indicate that human emotions can be effectively detected based on users’ textual communication, and significant relationships exists between several reactions and emotions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.