Today World Wide Web has become one of best sources of information which is result of faster working of search engines. Web spam attempts to sway search engine algorithm in order to boost the page ranking of specific web pages in search engine results than they deserve. One way to detect web spam is using classification that is learning a classification model for classifying web pages to spam or nonspam. Comparative and empirical analysis of web spam detection using data mining techniques like LAD Tree, JRIP, J48 and Random Forest have been presented in this paper. Experiments were carried out on 3 feature sets of standard dataset WEB SPAM UK-2007. Overall results say that Random forest works well with content based features and transformed link based features however LAD tree was found best among 4 in link based features. But, while thinking about time efficiency LAD Tree was found much more time consuming as compare other 3 classification techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.