This research manages in-depth analysis on the knowledge about spams and expects to propose an efficient spam filtering method with the ability of adapting to the dynamic environment. We focus on the analysis of email’s header and apply decision tree data mining technique to look for the association rules about spams. Then, we propose an efficient systematic filtering method based on these association rules. Our systematic method has the following major advantages: (1) Checking only the header sections of emails, which is different from those spam filtering methods at present that have to analyze fully the email’s content. Meanwhile, the email filtering accuracy is expected to be enhanced. (2) Regarding the solution to the problem of concept drift, we propose a window-based technique to estimate for the condition of concept drift for each unknown email, which will help our filtering method in recognizing the occurrence of spam. (3) We propose an incremental learning mechanism for our filtering method to strengthen the ability of adapting to the dynamic environment.
In this paper, we proposed an efficient spam filtering method based on decision tree data mining technique, analyzed the association rules about spams, and applied these rules to develop a systematized spam filtering method. Our method possessed the following three major superiorities: (i) checking only an e-mail's header section to avoid the low-operating efficiency in scanning an e-mail's content. Moreover, the accuracy of filtering was enhanced simultaneously. (ii) In order that the probable misjudgment in identifying an unknown e-mail could be "reversed", we had constructed a reversing mechanism to help the classification of unknown e-mails. Thus, the overall accuracy of our filtering method will be increased. (iii) Our method was equipped with a re-learning mechanism, which utilized the supervised machine learning method to collect and analyze each misjudged e-mail. Therefore, the revision information learned from the analysis of misjudged e-mails incrementally gave feedback to our method, and its ability of identifying spams would be improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.