Abstract-Email spam or junk e-mail (unsolicited e-mail "usually of a commercial nature sent out in bulk") is one of the major problem of the today's Internet, carrying financial damage to companies and annoying individual users. Among the approaches developed to stop spam, filtering is an important and popular one. Common uses for mail filters comprise organizing incoming email and removal of spam and computer viruses. In proposed work, we employed supervised machine learning techniques to filter the email spam messages. Extensively used supervised machine learning techniques namely C 4.5 Decision tree classifier, Multilayer Perceptron, Naïve Bayes Classifier are used for learning the features of spam emails and the model is built by training with known spam emails and legitimate emails.
Spam is a major concern in present emails, and there are several reasons for sending spam emails. The two most common ones are advertising and fraud. If supported by suitable preprocessing approaches, the detection algorithm for spam email or spam classifier will function effectively (removal of noise, removal of stop words, stemming, lemmatization, term frequency). Spam that combines both text and image components is referred to as hybrid spam. Compared to spam emails with images and text, it is more unsafe and complex. To distinguish spam or ham, we must use an effective and smart approach in order to have a strong representation of emails and improve classification performance. In this paper, we propose a multi-modal architecture relying on a feature model (MMA-FM) that concatenates two embedding vectors. The text and image sections of the similar emails were separated using a hybrid model (IMTF-IDF+Skip-thoughts) and the convolutional neural network (CNN) as a feature extraction technique. The extracted features are concatenated and given to Naïve Bayes (NB) and Support Vector Machine (SVM) models to classify hybrid email as either spam or ham. In this paper we used two hybrid datasets: Enron, Dredze, and TREC 2007, which are publicly accessible corpora. Our results show that the SVM model provides an accuracy of 99.16%, which is higher when compared to the Naïve Bayes method.
Spam messages can be referred as those mails which come into act in the absence of a standard agreement among the senders and receivers for receiving e-mail solicitation. Usually these messages are sent in bulk quantities. For preventing the spam delivery, an automatic system based spam filter tool is employed. The objectives of spam filters and spam are contradicted diametrically. A spam filter can be termed effective if it recognizes spam. On the other hand, it is ineffective when it escapes the filters. It is the need of the hour that these bulk unsolicited e-mails be effectively filtered. Increasing volume of these mails emphasizes on the requirement and design of dependable anti-spam filters. One of the techniques which is used widely to filter these spam e-mails is the machine learning technique. They possess in built algorithms which filters spam e-mails at commendable rates. In this project we present a method, to access classifier security against their attacks profoundly concentrating on the content of the message. The dependence on a predefined set of keywords is reduced. The paper also focuses on related works which apply machine learning techniques using naïve Bayes classification for e-mail message classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.