In on-line reviews, authors often use a short passage to describe the overall feeling about a product or a service. A review as a whole can mention many details not in line with the overall feeling, so capturing this key passage is important to understand the overall sentiment of the review. This paper investigates the use of extractive summarisation in the context of sentiment classification. The aim is to find the summary sentence, or the short passage, which gives the overall sentiment of the review, filtering out potential noisy information. Experiments on a movie review dataset show that subjectivity detection plays a central role in building summaries for sentiment classification. Subjective extracts carry the same polarity of the full text reviews, while statistical and positional approaches are not able to capture this aspect.
Text Classification systems are able to deal with large datasets, spending less time and human cost compared with manual classification. This is achieved, however, in expense of loss in quality. Semi-Automatic Text Classification (SATC) aims to achieve high quality with minimum human effort by ranking the documents according to their estimated certainty of being correctly classified. This paper introduces the Document Difficulty Framework (DDF), a unification of different strategies to estimate the document certainty, and its application to SATC. DDF exploits the scores and thresholds computed by any given classifier. Different metrics are obtained by changing the parameters of the three levels the framework is lied upon: how to measure the confidence for each document-class (evidence), which classes to observe (class) and how to aggregate this knowledge (aggregation). Experiments show that DDF metrics consistently achieve high error reduction with large portions of the collection being automatically classified. Furthermore, DDF outperforms all the reported SATC methods in the literature. This research focuses on an intermediate solution using Semi-Automatic Text Classification (SATC) [8, 1]. The main goal is to achieve high quality with minimum human
Many on-line services allow users to describe their opinions about a product or a service through a review. In order to help other users to find out the major opinion about a given topic, without the effort to read several reviews, multi-document summarisation is required. This research proposes an approach for extractive summarisation, supporting different scoring techniques, such as cosine similarity or divergence, as a method for finding representative sentences. The main contribution of this paper is the definition of an algorithm for sentence removal, developed to maximise the score between the summary and the original document. Instead of ranking the sentences and selecting the most important ones, the algorithm iteratively removes unimportant sentences until a desired compression rate is reached. Experimental results show that variations of the sentence removal algorithm provide good performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.