Detecting semantic cloaking on the web

Wu, Baoning; Davison, Brian D.

doi:10.1145/1135777.1135901

Cited by 59 publications

(40 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Web spam detection has been the focus of many research efforts (e.g., [8,11,15,18,21,22]). Most of the work utilizes the characteristics of link structures or content to differentiate spam from normal pages or sites.…”

Section: Related Workmentioning

confidence: 99%

Looking into the past to better classify web spam

Nishioka

Davison

2009

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

Self Cite

View full text Add to dashboard Cite

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue that historical web page information may also be important in spam classification. In this paper, we use content features from historical versions of web pages to improve spam classification. We use supervised learning techniques to combine classifiers based on current page content with classifiers based on temporal features. Experiments on the WEBSPAM-UK2007 dataset show that our approach improves spam classification F-measure performance by 30% compared to a baseline classifier which only considers current page content.

show abstract

Section: Related Workmentioning

confidence: 99%

Looking into the past to better classify web spam

Nishioka

Davison

2009

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web

Self Cite

View full text Add to dashboard Cite

show abstract

“…Cloaking techniques are commonly used in search poisoning attacks, which have been analyzed and measured in many existing work [27] [28] [29]. Wu et al present the earliest measurement study of cloaking techniques [27].…”

Section: Related Workmentioning

confidence: 99%

PoisonAmplifier: A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning Attacks

Zhang

Yang

et al. 2012

Research in Attacks, Intrusions, and Defenses

View full text Add to dashboard Cite

Abstract. Through injecting dynamic script codes into compromised websites, attackers have widely launched search poisoning attacks to achieve their malicious goals, such as spreading spam or scams, distributing malware and launching drive-by download attacks. While most current related work focuses on measuring or detecting specific search poisoning attacks in the crawled dataset, it is also meaningful to design an effective approach to find more compromised websites on the Internet that have been utilized by attackers to launch search poisoning attacks, because those compromised websites essentially become an important component in the search poisoning attack chain. In this paper, we present an active and efficient approach, named PoisonAmplifier, to find compromised websites through tracking down search poisoning attacks. Particularly, starting from a small seed set of known compromised websites that are utilized to launch search poisoning attacks, PoisonAmplifier can recursively find more compromised websites by analyzing poisoned webpages' special terms and links, and exploring compromised web sites' vulnerabilities. Through our 1 month evaluation, PoisonAmplifier can quickly collect around 75K unique compromised websites by starting from 252 verified compromised websites within first 7 days and continue to find 827 new compromised websites on a daily basis thereafter.

show abstract

“…Wu and Davison [16,17] performed several studies of sites that performed semantic cloaking. Similar to our research, they impersonated regular Internet users as a baseline as well as automated crawlers.…”

Section: Related Workmentioning

confidence: 99%

“…Past research has identified the HTTP useragent [16,17], HTTP referrer header [8], and other client characteristics [5] as triggers for cloaking. There is much anecdotal evidence of IP blacklists 1 being compiled and distributed [14,1], even commercially available databases of crawler IPs [3], but few studies of their mechanics and prevalence.…”

Section: Introductionmentioning

confidence: 99%

Improving malicious URL re-evaluation scheduling through an empirical study of malware download centers

Zeeuwen

Ripeanu

Beznosov

2011

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality

View full text Add to dashboard Cite

The retrieval and analysis of malicious content is an essential task for security researchers. At the same time, the distributors of malicious files deploy countermeasures to evade the scrutiny of security researchers. This paper investigates two techniques used by malware download centers: frequently updating the malicious payload, and blacklisting (i.e., refusing HTTP requests from researchers based on their IP). To this end, we sent HTTP requests to malware download centers over a period of four months. The requests are distributed across two pools of IPs, one exhibiting high volume research behaviour and another exhibiting semi-random, low volume behaviour. We identify several distinct update patterns, including sites that do not update the binary at all, sites that update the binary for each new client but then repeatedly serve a specific binary to the same client, sites that periodically update the binary with periods ranging from one hour to 84 days, and server-side polymorphic sites, that deliver new binaries for each HTTP request. From this classification we identify several guidelines for crawlers that re-query malware download centers looking for binary updates. We propose a scheduling algorithm that incorporates these guidelines, and perform a limited evaluation of the algorithm using the data we collected. We analyze our data for evidence of blacklisting and find strong evidence that a small minority of URLs blacklisted our high volume IPs, but for the majority of malicious URLs studied, there was no observable blacklisting response, despite issuing over over 1.5 million requests to 5001 different malware download centers.

show abstract

Detecting semantic cloaking on the web

Cited by 59 publications

References 23 publications

Looking into the past to better classify web spam

Looking into the past to better classify web spam

PoisonAmplifier: A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning Attacks

Improving malicious URL re-evaluation scheduling through an empirical study of malware download centers

Contact Info

Product

Resources

About