Sentiment analysis on views and opinions expressed in Indian regional languages has become the current focus of research. But, compared to a globally accepted language like English, research on sentiment analysis in Indian regional languages like Malayalam are very low. One of the major hindrances is the lack of publicly available Malayalam datasets. This work focuses on building a Malayalam dataset for facilitating sentiment analysis on Malayalam texts and studying the efficiency of a pre-trained deep learning model in analyzing the sentiments latent in Malayalam texts. In this work, a Malayalam dataset has been created by extracting 2,000 tweets from Twitter. The bidirectional encoder representations from transformers (BERT) is a pretrained model that has been used for various natural language processing tasks. This work employs a transformer-based BERT model for Malayalam sentiment analysis. The efficacy of BERT in analyzing the sentiments latent in Malayalam texts has been studied by comparing the performance of BERT with various machine learning models as well as deep learning models. By analyzing the results, it is found that a substantial increase in accuracy of 5% for BERT when compared with that of Bi-GRU, which is the next bestperforming model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.