Proceedings of the 15th International Conference on World Wide Web 2006
DOI: 10.1145/1135777.1135901
|View full text |Cite
|
Sign up to set email alerts
|

Detecting semantic cloaking on the web

Abstract: By supplying different versions of a web page to search engines and to browsers, a content provider attempts to cloak the real content from the view of the search engine. Semantic cloaking refers to differences in meaning between pages which have the effect of deceiving search engine ranking algorithms. In this paper, we propose an automated two-step method to detect semantic cloaking pages based on different copies of the same page downloaded by a web crawler and a web browser. The first step is a filtering s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
37
0
3

Year Published

2008
2008
2018
2018

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 59 publications
(40 citation statements)
references
References 23 publications
0
37
0
3
Order By: Relevance
“…Web spam detection has been the focus of many research efforts (e.g., [8,11,15,18,21,22]). Most of the work utilizes the characteristics of link structures or content to differentiate spam from normal pages or sites.…”
Section: Related Workmentioning
confidence: 99%
“…Web spam detection has been the focus of many research efforts (e.g., [8,11,15,18,21,22]). Most of the work utilizes the characteristics of link structures or content to differentiate spam from normal pages or sites.…”
Section: Related Workmentioning
confidence: 99%
“…Cloaking techniques are commonly used in search poisoning attacks, which have been analyzed and measured in many existing work [27] [28] [29]. Wu et al present the earliest measurement study of cloaking techniques [27].…”
Section: Related Workmentioning
confidence: 99%
“…Wu and Davison [16,17] performed several studies of sites that performed semantic cloaking. Similar to our research, they impersonated regular Internet users as a baseline as well as automated crawlers.…”
Section: Related Workmentioning
confidence: 99%
“…Past research has identified the HTTP useragent [16,17], HTTP referrer header [8], and other client characteristics [5] as triggers for cloaking. There is much anecdotal evidence of IP blacklists 1 being compiled and distributed [14,1], even commercially available databases of crawler IPs [3], but few studies of their mechanics and prevalence.…”
Section: Introductionmentioning
confidence: 99%