Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1187
|View full text |Cite
|
Sign up to set email alerts
|

Deceptive Review Spam Detection via Exploiting Task Relatedness and Unlabeled Data

Abstract: Existing work on detecting deceptive reviews primarily focuses on feature engineering and applies off-the-shelf supervised classification algorithms to the problem. Then, one real challenge would be to manually recognize plentiful ground truth spam review data for model building, which is rather difficult and often requires domain expertise in practice. In this paper, we propose to exploit the relatedness of multiple review spam detection tasks and readily available unlabeled data to address the scarcity of la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(23 citation statements)
references
References 20 publications
0
23
0
Order By: Relevance
“…Donato et al [15] employed PU-learning to the problem using unlabeled data. Hai et al [16] developed a multi-task learning method based on logistic regression. Feng et al [17] studied the distributions of rating scores and introduced strategies to create a dataset with pseudo-standard.…”
Section: Classification Of Deceptive Reviewsmentioning
confidence: 99%
“…Donato et al [15] employed PU-learning to the problem using unlabeled data. Hai et al [16] developed a multi-task learning method based on logistic regression. Feng et al [17] studied the distributions of rating scores and introduced strategies to create a dataset with pseudo-standard.…”
Section: Classification Of Deceptive Reviewsmentioning
confidence: 99%
“…By comparison, Ott et al (2011) used AMT to crowdsource anonymous online turkers to construct the text-based spam review dataset; thus, they are real spams; several text-classification-based work have been performed based on this dataset (Feng et al 2012;Hai et al, 2016). Although expert annotation cannot fully eliminate the possibility of any noise, the spam reviews created by turkers may be more reliable because these reviews are real spams.…”
Section: Deceptive Opinion Spam Detectionmentioning
confidence: 99%
“…Although expert annotation cannot fully eliminate the possibility of any noise, the spam reviews created by turkers may be more reliable because these reviews are real spams. Due to the expensiveness of recruiting turkers, such kinds of existing datasets are usually limited (Hai et al, 2016). Li et al (2014) also released another spam review dataset, which has a style similar to that of Ott's dataset, but is still relatively small in scale.…”
Section: Deceptive Opinion Spam Detectionmentioning
confidence: 99%
“…Some previous work singles out quantity, specificity, diversity, non-immediacy, as well as task specific features such as affect, expressivity, complexity, uncertainty, and informality (Zhou et al 2004;Fuller et al 2006). Hai et al (2016) use review spam detection for different domains (hotel and restaurant) as a multitask learning problem by sharing the knowledge from training applied to each task and a graph regularizer for each model to incorporate unlabeled data. Mukherjee, Dutta, and Weikum (2017) use a model based on latent topic models in combination with limited metadata to compute a credibility score for reviews as well as to identify inconsistencies that appear between a review and the overall characterization of an item both for the item and for each latent facet.…”
Section: Review Spam Detectionmentioning
confidence: 99%