Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web 2007
DOI: 10.1145/1244408.1244412
|View full text |Cite
|
Sign up to set email alerts
|

Improving web spam classifiers using link structure

Abstract: Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16,12] and link spam [22,13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We descr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
27
0
7

Year Published

2008
2008
2015
2015

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(34 citation statements)
references
References 17 publications
0
27
0
7
Order By: Relevance
“…With spam, hyperlink direction is of great importance since we do not expect genuine hosts to link to spam hosts even when links in the opposite direction are quite common. This has been empirically confirmed in Castillo et al (2007), Gan and Suel (2007). Thus, we will also consider "positive distance squared" as a distortion measure, that is where…”
Section: Learning With Graph Regularizationmentioning
confidence: 63%
See 1 more Smart Citation
“…With spam, hyperlink direction is of great importance since we do not expect genuine hosts to link to spam hosts even when links in the opposite direction are quite common. This has been empirically confirmed in Castillo et al (2007), Gan and Suel (2007). Thus, we will also consider "positive distance squared" as a distortion measure, that is where…”
Section: Learning With Graph Regularizationmentioning
confidence: 63%
“…Another option is to extract link-based metrics for each node and use these as features in any standard classification algorithm . Finally, it has been shown that the link-based information can be used to refine the results of a base classifier by re-labelling using propagation through the hyperlink graph, or a stacked classifier (Castillo et al 2007;Gan and Suel 2007).…”
mentioning
confidence: 99%
“…They can create dummy web sites which link to the website they want to push (link farms), exchange links with other webmasters, buy links on third party web pages, and post links to their websites, for instance, in blogs. To detect link spam, much research has been performed, among others [5][6][7][8][9][10][11][12].…”
Section: Related Workmentioning
confidence: 99%
“…[5] implemented a classifier to catch a large portion of spam, then several heuristics rules were designed to decide whether a node should be relabeled. [2] summarized the existing content and link based method, detected web spam with machine learning algorithms, then gave some heuristic rules to improve the performance.…”
Section: Introductionmentioning
confidence: 99%
“…Both [2] and [5] achieved good results with the preliminary machine learning algorithms, but they optimized the detection result with some heuristic rules. As we all know, effective spam detection is essentially an "arms race" between search engines and spamers.…”
Section: Introductionmentioning
confidence: 99%