2015
DOI: 10.1007/978-3-319-18117-2_21
|View full text |Cite
|
Sign up to set email alerts
|

Detection of Opinion Spam with Character n-grams

Abstract: Abstract. In this paper we consider the detection of opinion spam as a stylistic classification task because, given a particular domain, the deceptive and truthful opinions are similar in content but differ in the way opinions are written (style). Particularly, we propose using character ngrams as features since they have shown to capture lexical content as well as stylistic information. We evaluated our approach on a standard corpus composed of 1600 hotel reviews, considering positive and negative reviews. We… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
44
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 44 publications
(45 citation statements)
references
References 19 publications
1
44
0
Order By: Relevance
“…Different linguistic features can be considered; the simplest ones are either individual words or n ‐grams, (i.e., groups of n words), which can be either weighted or unweighted . For a complete overview on the multiple kinds of linguistic features that have been proposed in the literature, refer to Crawford et al The majority of purely content‐based approaches are based on supervised learning techniques …”
Section: Approaches To Credibility Assessmentmentioning
confidence: 99%
See 3 more Smart Citations
“…Different linguistic features can be considered; the simplest ones are either individual words or n ‐grams, (i.e., groups of n words), which can be either weighted or unweighted . For a complete overview on the multiple kinds of linguistic features that have been proposed in the literature, refer to Crawford et al The majority of purely content‐based approaches are based on supervised learning techniques …”
Section: Approaches To Credibility Assessmentmentioning
confidence: 99%
“…On the same dataset, Banerjee and Chua use logistic regression (LR) to build a classification model by considering the readability of a review (i.e., its complexity and reading difficulty), the distribution of POS tags, and the review writing style, i.e., positive cues, perceptual words, and usage of future tense. Fusilier et al use both character n ‐grams and word n ‐grams obtaining the best results with character n ‐grams with values for n of 4 and 5, respectively, by using NB and an SVM classifier on the Ott et al’s dataset…”
Section: Approaches To Credibility Assessmentmentioning
confidence: 99%
See 2 more Smart Citations
“…In the relatively new field of fake review detection, especially in real‐life Web sites, much of the previous work focuses on techniques based on review text analysis and behavioral/meta‐data analysis by applying supervised, semi‐supervised, and unsupervised machine learning techniques. Regarding supervised approaches that take into account the text of the reviews, some works analyze the actual content of the reviews, modeling it with n ‐gram feature sets . Other works are based on the identification of duplicate reviews, considered as fake reviews .…”
Section: Related Workmentioning
confidence: 99%