Web spam has the effect of polluting search engine results and decreasing the usefulness of search engines. Web spam can be classified according to the methods used to raise the web page's ranking by subverting web search engine's algorithms used to rank search results. The main types are: content spam, link spam and cloaking spam. There has been little or no work on automatically classifying web spam by type. This paper has two contributions, (i) we propose a Dual-Margin Multi-Class Hypersphere Support Vector Machine (DMMH-SVM) classifier approach to automatically classifying web spam by type, (ii) we introduce novel cloaking-based spam features which help our classifier model to achieve high precision and recall rate, thereby reducing the false positive rates. The effectiveness of the proposed model is justified analytically. Our experimental results demonstrated that DMMH-SVM outperforms existing algorithms with novel cloaking features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.