Automatic Text Classification is a semi-supervised machine learning task that automatically assigns a given document to a set of pre-defined categories based on its textual content and extracted features. Automatic Text Classification has important applications in content management, contextual search, opinion mining, product review analysis, spam filtering and text sentiment mining. This paper explains the generic strategy for automatic text classification and surveys existing solutions to major issues such as dealing with unstructured text, handling large number of attributes and selecting a machine learning technique appropriate to the text-classification application.
The growth of E-commerce has led to the invention of several websites that market and sell products as well as allow users to post reviews. It is typical for an online buyer to refer to these reviews before making a buying decision. Hence, automatic summarization of users' reviews has a great commercial significance. However, since the product reviews are written by nonexperts in an unstructured, natural language text, the task of summarizing them is challenging. This paper presents a semisupervised approach for mining online user reviews to generate comparative feature-based statistical summaries that can guide a user in making an online purchase. It includes various phases like preprocessing and feature extraction and pruning followed by featurebased opinion summarization and overall opinion sentiment classification. Empirical studies indicate that the approach used in the paper can identify opinionated sentences from blog reviews with a high average precision of 91% and can classify the polarity of the reviews with a good average accuracy of 86%.
Nowadays, there are several websites that allow customers to buy and post reviews of purchased products, which results in incremental accumulation of a lot of reviews written in natural language. Moreover, conversance with E-commerce and social media has raised the level of sophistication of online shoppers and it is common practice for them to compare competing brands of products before making a purchase. Prevailing factors such as availability of online reviews and raised end-user expectations have motivated the development of opinion mining systems that can automatically classify and summarize users' reviews. This paper proposes an opinion mining system that can be used for both binary and fine-grained sentiment classifications of user reviews. Feature-based sentiment classification is a multistep process that involves preprocessing to remove noise, extraction of features and corresponding descriptors, and tagging their polarity. The proposed technique extends the feature-based classification approach to incorporate the effect of various linguistic hedges by using fuzzy functions to emulate the effect of modifiers, concentrators, and dilators. Empirical studies indicate that the proposed system can perform reliable sentiment classification at various levels of granularity with high average accuracy of 89% for binary classification and 86% for fine-grained classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.