Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents 2010
DOI: 10.1145/1871985.1871994
|View full text |Cite
|
Sign up to set email alerts
|

Spam detection with a content-based random-walk algorithm

Abstract: In this work we tackle the problem of the spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phe-nomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to ob-tain an apriori estimation of the spam like… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2011
2011
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…That is, PageRank considers a node as important only if it is recommended (i.e., connected to) by other important nodes in the graph. The same idea has been widely used for separating legitimate pages from spam pages [56]. Similarly, in our problem context, if an API class co-occurs with other important API classes across multiple code segments that are relevant to a programming task, then this API class is also considered to be important for the task.…”
Section: A Development Of Candidate Api Listsmentioning
confidence: 98%
“…That is, PageRank considers a node as important only if it is recommended (i.e., connected to) by other important nodes in the graph. The same idea has been widely used for separating legitimate pages from spam pages [56]. Similarly, in our problem context, if an API class co-occurs with other important API classes across multiple code segments that are relevant to a programming task, then this API class is also considered to be important for the task.…”
Section: A Development Of Candidate Api Listsmentioning
confidence: 98%
“…For the online voting systems, Benevenuto et al explored YouTube.com to detect spammers who try to increase the reputations of malicious movies by posting a series of responses, and they exploited video attributes (ratings), user attributes (activities), and social-network metrics (clustering coefficient and betweenness) [24]. To enhance the performance of spam detection, several approaches build social-network-based approaches on top of content-based schemes [14,25,26]. Using the network spam-filter features (in-and out-link, cross-link, etc.)…”
Section: A Spam Filteringmentioning
confidence: 99%
“…The conventional content-based techniques, which scrutinize the textual content of the pages, and the link-based techniques (Zhou and Pei, 2009;Ortega, et al, 2010) have been used but to little avail primarily because the state-of-art spam techniques has evolved and made spam websites appear very similar to bona fide websites in their link structures (Cheng, at al., 2011). Thus, this suggests that anti-spamming efforts may need to take a collaborative approach within the user community.…”
Section: Literature Review Knowledge Sharingmentioning
confidence: 99%