Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web 2009
DOI: 10.1145/1531914.1531916
|View full text |Cite
|
Sign up to set email alerts
|

Looking into the past to better classify web spam

Abstract: Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue that historical web page information may also be important in spam classification. In this paper, we use content features from historical versions of web pages to improve spam classification. We use supervised learning techniques to combine classifiers based on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
5

Year Published

2009
2009
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 35 publications
(23 citation statements)
references
References 14 publications
0
18
5
Order By: Relevance
“…To our knowledge, there has been no prior work discussing the re-use of closed banking websites. However, several researchers have observed that spammers sometimes re-register expired domains in order to benefit from the reputation of the old domain [2,3,4]. For instance, Hao et al found that spammers quickly register recently expired domains, much faster than non-spammers [4].…”
Section: Related Workmentioning
confidence: 99%
“…To our knowledge, there has been no prior work discussing the re-use of closed banking websites. However, several researchers have observed that spammers sometimes re-register expired domains in order to benefit from the reputation of the old domain [2,3,4]. For instance, Hao et al found that spammers quickly register recently expired domains, much faster than non-spammers [4].…”
Section: Related Workmentioning
confidence: 99%
“…New features based on time series [10,7,8] as well as normalization methods across different snapshots and TLDs are the expected outcome of the proposed tasks.…”
Section: Existing and Expected Filtering Technologiesmentioning
confidence: 99%
“…As an alternate solution, only feature sets such as "public" [5] can be made available; in this case a precompiled set of content change features based e.g. on [7] should also be compiled.…”
Section: Open Questionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Wu and Davison [23] expand from a seed set of spam pages to the neighbors to find more suspicious pages in the web graph. Dai et al [5] exploit the historical content information of web pages to improve spam classification, while Chung et al [4] propose to use time series to study the link farm evolution. Martinez-Romo and Araujo [18] apply a language model approach to improve web spam identification.…”
Section: Introductionmentioning
confidence: 99%