2007
DOI: 10.3233/ida-2007-11505
|View full text |Cite
|
Sign up to set email alerts
|

An evaluation of Naive Bayes variants in content-based learning for spam filtering

Abstract: We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two current variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two variants of Naive Bayes learning, SpamAssassin and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surpr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2009
2009
2016
2016

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(18 citation statements)
references
References 5 publications
0
18
0
Order By: Relevance
“…However, in order to have an adequate assessment of the performance of filters, it is necessary to adopt more realistic evaluation settings (e.g., the TREC corpora, Cormack, 2006Cormack, , 2007Cormack & Lynam, 2005), that better mimic the scenario faced by a filter deployed for practical operation. In particular, the argument raised by Cormack and Lynam (2007), and further reinforced by Seewald (2007), regarding the still unproven potential of more advanced machine learning algorithms to Spam filtering, can be associated to the evaluation scenarios considered. More then simply affecting the experimental results obtained when reporting the development of a new filter, this may inspire the development of customized filters, tailored to the characteristics of the problem (cf.…”
Section: Discussionmentioning
confidence: 97%
See 1 more Smart Citation
“…However, in order to have an adequate assessment of the performance of filters, it is necessary to adopt more realistic evaluation settings (e.g., the TREC corpora, Cormack, 2006Cormack, , 2007Cormack & Lynam, 2005), that better mimic the scenario faced by a filter deployed for practical operation. In particular, the argument raised by Cormack and Lynam (2007), and further reinforced by Seewald (2007), regarding the still unproven potential of more advanced machine learning algorithms to Spam filtering, can be associated to the evaluation scenarios considered. More then simply affecting the experimental results obtained when reporting the development of a new filter, this may inspire the development of customized filters, tailored to the characteristics of the problem (cf.…”
Section: Discussionmentioning
confidence: 97%
“…In addition, the integrated approach achieved similar results to the SVM. Seewald (2007) evaluated the performance of a simple naive Bayes implementation (SpamBayes), along with CRM114 and SpamAssassin, which also employ more sophisticated language models and hard-coded rules, respectively. For the initial experiments, seven private mailboxes were used.…”
Section: Comparative Studiesmentioning
confidence: 99%
“…They currently appear to be very popular in proprietary and open-source spam filters, including several free web-mail servers and open-source systems [25,35,45]. This is probably due to their simplicity, computational complexity and accuracy rate, which are comparable to more elaborate learning algorithms [35,38,46].…”
Section: Related Workmentioning
confidence: 98%
“…Further details about other techniques used for anti-spam filtering and applications that employ Bayesian classifiers are available in Bratko et al [9], Seewald [45], Koprinska et al [32], Cormack [14], Song et al [46], Marsono et al [35] and Guzella and Caminhas [25].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation