A principal problem of any internet user is the increasing number of spam, which became a great problem today. Therefore, spam filtering has become a research focus that attracts the attention of several security researchers and practitioners. Spam filtering can be viewed as a two-class classification problem. To this end, this paper proposes a spam filtering approach based on Possibilistic c-Means (PCM) algorithm and weighted distance coined as (WFCM) that can efficiently distinguish between spam and legitimate email messages. The objective of the formulated fuzzy problem is to construct two fuzzy clusters: spam and email clusters. The weight assignment is set by information gain algorithm. Experimental results on spam based benchmark dataset reveal that proper setting of feature-weight can improve the performance of the proposed spam filtering approach. Furthermore, the proposed spam filtering approach performance is better than PCM and Naïve Bayes filtering technique. Science, 2017, Vol. 58, No.2C, pp: 1112-1127 1113 صداء فإن ذلك، عمى وعالوة المقترحة. المزع، يد البر تبفية المزع، تبفية ني، صفضل ىو المقترح من المزع، تبفية تقنية PCM وتقنية بايز البسيطة . Keywords IntroductionIn the last few decades, the electronic mail (email) became one of the most important ways of communication. Therefore, several people and companies attempt to send a vast amount of unsolicited messages to the massive number of users. This type of messages are called spam mail [1]. Spam is flooding the Internet with massive versions of a single message, in an attempt to oblige the message on people who could not refuse it [2]. Undoubtedly the reason to send those messages by email is easy communication methods, cost effectiveness [1, 2] and an import carrier for nonperforming commercial advertising, hacker programs, the spread of the virus, and so on [3].The spam mail has caused some problems. The first one, it causes loss of network resources, which is significant for network users. Moreover, practically it greatly affects the daily work for a lot of users; the people are wasting a lot of time dealing with spam, there are many spam mails which attract users, but it may in fact contain unexpected malicious attachments which would seriously crack the user's system [4].There are various techniques to anti-spam [2], but usually their techniques vary daily, whatever anti-spam technology used; it must be capable to adapt rapidly. There are three important characteristics to reach a good anti-spam technique: firstly, it will accurately classify spam and legitimate mail; secondly, it will be well adaptable, and finally, it will be easily scalable [5] .Usually, spam has unqualified or no absolute definition to distinguish it from legitimate emails. Hence, the discipline of Machine Learning (ML) has recently engaged considerable attention in the design of effective spam filtering functions.In 2011 [3], two methods were proposed. The first method is used to calculate the similarity between semanti...
The clustering analysis techniques play an important role in the area of data mining. Although from existence several clustering techniques. However, it still to their tries to improve the clustering process efficiently or propose new techniques seeks to allocate objects into clusters so that two objects in the same cluster are more similar than two objects in different clusters and careful not to duplicate the same objects in different groups with the ability to cover all data as much as possible. This paper presents two directions. The first is to propose a new algorithm that coined a name (MB Algorithm) to collect unlabeled data and put them into appropriate groups. The second is the creation of a lexical sequence sentence (LCS) based on similar semantic sentences which are different from the traditional lexical word chain (LCW) based on words. The results showed that the performance of the MB algorithm has generally outperformed the two algorithms the hierarchical clustering algorithm and the K-mean algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.