A new method for clustering of spam messages collected in bases of antispam system is offered. The genetic algorithm is developed for solving clustering problems. The objective function is a maximization of similarity between messages in clusters, which is defined by k-nearest neighbor algorithm. Application of genetic algorithm for solving constrained problems faces the problem of constant support of chromosomes which reduces convergence process. Therefore, for acceleration of convergence of genetic algorithm, a penalty function that prevents occurrence of infeasible chromosomes at ranging of values of function of fitness is used. After classification, knowledge extraction is applied in order to get information about classes. Multidocument summarization method is used to get the information portrait of each cluster of spam messages. Classifying and parametrizing spam templates, it will be also possible to define the thematic dependence from geographical dependence (e.g., what subjects prevail in spam messages sent from certain countries). Thus, the offered system will be capable to reveal purposeful information attacks if those occur. Analyzing origins of the spam messages from collection, it is possible to define and solve the organized social networks of spammers.
In the recent years spam became as a big problem of Internet and electronic communication. There developed a lot of techniques to fight them. In this paper the overview of existing e-mail spam filtering methods is given. The classification, evaluation, and comparison of traditional and learning-based methods are provided. Some personal anti-spam products are tested and compared. The statement for new approach in spam filtering technique is considered.
Recently the number of undesirable messages coming to e-mail has strongly increased. As spam has changeable character the anti-spam systems should be trainable and dynamical. The machine learning technology is successfully applied in a filtration of e-mail from undesirable messages for a long time. In this paper it is offered to apply Case Based Reasoning technology to a spam filtering problem. The possibility of continuous updating of spam templates base on the bases of which new coming spam messages are compared, will raise efficiency of a filtration. Changing a combination of conditions it is possible to construct flexible filtration system adapted for different users or corporations. Also in this paper it is considered the second approach as implementation of CRM technology to spam filtration which is not applied to this area yet
Abstract-In this paper the develop ment of anti-spam software detecting information attacks is offered. For this purpose it is considered spam filtrat ion system with the multilayered, mu ltivalent architecture, coordinating all ISP's in the country. All users and ISPs of this system involved in spam filtration p rocess. After spam filtering process, saved spam templates are analyzed and classified. This parameterizing of spam temp lates give possibility to define the thematic dependence from geographical. For example, what subjects prevail in spam messages sent from the certain countries? Analyzing origins of spam temp lates fro m spam-base, it is possible to define and solve the organized social networks of spammers. Thus, the offered system will be capable to reveal purposeful in formation attacks if those occur.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.