With the increase in internet access and the ease of writing comments in the Nepali language, fine-grained sentiment analysis of social media comments is becoming more and more pertinent. There are a number of benchmarked datasets for high-resource languages (English, French, and German) in specific domains like restaurants, hotels or electronic goods but not in low-resource languages like Nepali. In this paper, we present our work to create a dataset for the targeted aspect-based sentiment analysis in the social media domain, set up a dataset benchmark and evaluate using various machine learning models. The dataset comprises of code-mixed and code-switched comments extracted from Nepali YouTube videos. We present convincing baselines using a multilingual BERT model for the Aspect Term Extraction task and BiLSTM model for the Sentiment Classification Task achieving 57.978% and 81.60% F1 score respectively.
The increasing amount of Nepali content on the web has opened doors for the research and development of a number of Natural Language Processing applications including Sentiment Analysis (SA). However, to best of our knowledge there has been no work in this area for Nepali language. In this paper we present two main approaches for sentiment detection of Nepali texts. We have developed Nepali Sentiment Corpus and Nepali SentiWordNet. In our first approach we develop a lexical resource called Bhavanakos, which is a Nepali SentiWordNet and implement a strategy in which sentiment words are detected in Nepali texts to detect the sentiment in documents. The second of our approach we train a machine learning based text classifier with annotated Nepali text data to classify the document.
Aspect-based Sentiment Analysis assists in understanding the opinion of the associated entities helping for a better quality of a service or a product. A model is developed to detect the aspect-based sentiment in Nepali text using Machine Learning (ML) classifier algorithms namely Support Vector Machine (SVM) and Naïve Bayes (NB). The system collects Nepali text data from various websites and Part of Speech (POS) tagging is applied to extract the desired features of aspect and sentiment. Manual labeling is done for each sentence to identify the sentiment of the sentence. Term Frequency – Inverse Document Frequency (TF-IDF) is applied to compute the importance of the words. The feature vectors thus produced are then applied to the Classifier algorithms to predict and classify the sentence. The accuracy obtained by the SVM classifier is 76.8% whereas Bernoulli NB is 77.5%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.