Sentiment classification has received increasing attention in recent years. Supervised learning methods for sentiment classification require considerable amount of labeled data for training purposes. As the number of domains increases, the task of collecting data becomes impractical. Therefore, domain adaptation techniques are employed. However, most of the studies dealing with the domain adaptation problem demand a few amount of labeled data or lots of unlabeled data belonging to the target domain, which may not be always possible. In this work, a novel method for sentiment classification, which does not require labeled and/or unlabeled data from the target domain, is proposed. The propose method mainly consists of two stages. At first, the target domain is predicted even if it is not among the source domains in hand. Then, sentiment is classified as either positive or negative using the sentiment classifier specifically trained for the predicted domain. Extensive experimental analysis on two different datasets with distinct languages and domains verifies that the proposed method is superior to the domain independent sentiment classification approach at each case considered.
Author identification, one of the popular topics in text classification and natural language processing, basically aims to determine the author of a given text through various analyses. In the literature, different text representation approaches and use of preprocessing steps are considered for author identification problem. This paper aims to comprehensively examine the impact of text representation and preprocessing steps on author identification specifically for Turkish language. For this purpose, the contributions of all possible combinations of different text representation approaches, namely unigram and bigram, together with the preprocessing tasks, including stemming and stop-word removal, to the performance of author identification are investigated. For the experimental evaluation, a brand new dataset is constituted. Also, two different classification algorithms, namely Multinomial Naive Bayes and Sequential Minimal Optimization, are employed. The results of the experimental analysis reveal that using bigram features alone should be avoided. Besides, it is shown that stop-words should be kept inside the text while stemming can be preferred depending on the classification algorithm so that higher performance can be achieved for author identification.
Recommender systems have recently become a significant part of e-commerce applications. Through the different types of recommender systems, collaborative filtering is the most popular and successful recommender system for providing recommendations. Recent studies have shown that using multi-criteria ratings helps the system to know the customers better. However, bringing multi aspects to collaborative filtering causes new challenges such as scalability and sparsity. Additionally, revealing the relation between criteria is yet another optimization problem. Hence, increasing the accuracy in prediction is a challenge. In this paper, an aggregation-function based multi-criteria collaborative filtering system using Rough Sets Theory is proposed as a novel approach. Rough Sets Theory is used to uncover the relationship between the overall criterion and the individual criteria. Experimental results show that the proposed model (RoughMCCF) successfully improves the predictive accuracy without compromising on online performance.
Opinion target extraction is one of the core tasks in sentiment analysis on text data. In recent years, dependency parser-based approaches have been commonly studied for opinion target extraction. However, dependency parsers are limited by language and grammatical constraints. Therefore, in this work, a sequential pattern-based rule mining model, which does not have such constraints, is proposed for cross-domain opinion target extraction from product reviews in unknown domains. Thus, knowing the domain of reviews while extracting opinion targets becomes no longer a requirement. The proposed model also reveals the difference between the concepts of opinion target and aspect, which are commonly confused in the literature. The model consists of two stages. In the first stage, the aspects of reviews are extracted from the target domain using the rules automatically generated from source domains. The aspects are also transferred from the source domains to a target domain. Moreover, aspect pruning is applied to further improve the performance of aspect extraction. In the second stage, the opinion target is extracted among the aspects extracted at the former stage using the rules automatically generated for opinion target extraction. The proposed model was evaluated on several benchmark datasets in different domains and compared against the literature. The experimental results revealed that the opinion targets of the reviews in unknown domains can be extracted with higher accuracy than those of the previous works.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.