A B S T R A C T :Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on de-tecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong si-milarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then cal-culate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several ex-periments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different com-binations of opinion and retrieval scores. We then use this data to deduce the best opinion de-tection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.
In this paper, we are interested in defining the gender of blogger while using only texts written from bloggers. For that purpose, we offer a number of features based on specific words, which were categorized into classes. For each blog, a score is calculated based on these characteristics, thereby determining the gender of its author. The evaluation was made on a corpus of 681,288 Blogs (140 million words) tagged as men or women. In our work, this collection will be taken as a reference. The obtained results show gender detection over 82% compared to the referenced collection.
This article describes approaches for searching opinionated documents for a given query from a standard data collection. To detect if a text is opinionated (i.e., contain subjective information) or not, we propose two methods: the first method is based on lexicons of subjective words (i.e., SentiWordNet) supported by the assumption that more a document contains the subjective terms more it has the tendency of being an opinionated document while the second method is based on probabilistic model supporting the idea that given a document having a strong similarity with a reference opinionated text is more likely to be opinionated. In the second method, we take support of language modeling approach to compute this similarity. Experiments are conducted with TREC Blog06 as the test collection and the IMDB data collection as being the reference data collection. The experimental results report effectiveness of both methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.