Sentiment analysis is about the classification of sentiments expressed in review documents. In order to improve the classification accuracy, feature selection methods are often used to rank features so that non-informative and noisy features with low ranks can be removed. In this study, we propose a new feature selection method, called query expansion ranking, which is based on query expansion term weighting methods from the field of information retrieval. We compare our proposed method with other widely used feature selection methods, including Chi square, information gain, document frequency difference, and optimal orthogonal centroid, using four classifiers: naïve Bayes multinomial, support vector machines, maximum entropy modelling, and decision trees. We test them on movie and multiple kinds of product reviews for both Turkish and English languages so that we can show their performances for different domains, languages, and classifiers. We observe that our proposed method achieves consistently better performance than other feature selection methods, and query expansion ranking, Chi square, information gain, document frequency difference methods tend to produce better results for both the English and Turkish reviews when tested using naïve Bayes multinomial classifier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.