From just an annoying characteristic of the electronic mail epoch, spam has evolved into an expensive resource and time-consuming problem. In this survey, we focus on emerging approaches to spam filtering built on recent developments in computing technologies. These include peer-to-peer computing, grid computing, semantic Web, and social networks. We also address a number of perspectives related to personalization and privacy in spam filtering. We conclude that, while important advancements have been made in spam filtering in recent years, high performance approaches remain to be explored due to the large scale of the problem.
Spam continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) based techniques have been proposed for spam classification. However, SVM training is a computationally intensive process. This paper presents a parallel SVM algorithm for scalable spam filtering. By distributing, processing and optimizing the subsets of the training data across multiple participating nodes, the distributed SVM reduces the training time significantly. Ontology based concepts are also employed to minimize the impact of accuracy degradation when distributing the training data amongst the SVM classifiers.
Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.