2010
DOI: 10.1145/1811099.1811090
|View full text |Cite
|
Sign up to set email alerts
|

A case for unsupervised-learning-based spam filtering

Abstract: Spam filtering has traditionally relied on extracting spam signatures via supervised learning, i.e., using emails explicitly manually labeled as spam or ham. Such supervised learning is labor-intensive and costly, more importantly cannot adapt to new spamming behavior quickly enough. The fundamental reason for needing labeled training corpus is that the learning, e.g., the process of extracting signatures, is carried out by examining individual emails. In this paper, we study the feasibility of unsupervised le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…For SMS dataset, the highest performance (accuracy 98.7%) is achieved from the combined model of Word Embedding and neural network. Apart from the supervised learning approaches, there are numerous works on unsupervised modeling as well [17][18][19][20][21][22]. Utilizing Modified Density-Based Spatial Clustering of Applications with Noise (M-DBSCAN), 97.848% accuracy has been obtained by Manaa et al.…”
Section: Machine Learning and Deep Learning Based Methodsmentioning
confidence: 99%
“…For SMS dataset, the highest performance (accuracy 98.7%) is achieved from the combined model of Word Embedding and neural network. Apart from the supervised learning approaches, there are numerous works on unsupervised modeling as well [17][18][19][20][21][22]. Utilizing Modified Density-Based Spatial Clustering of Applications with Noise (M-DBSCAN), 97.848% accuracy has been obtained by Manaa et al.…”
Section: Machine Learning and Deep Learning Based Methodsmentioning
confidence: 99%
“…As the name indicates, unsupervised learning based models work only with unlabelled data so no training phase is involved; whereas supervised techniques have the requirement of training over a large dataset often requiring costly data labelling [17]. Unsupervised algorithms most commonly attempt to discover a common pattern associated with the features being processed within the dataset [18]. The algorithm rearranges the data items in separate clusters.…”
Section: Proposed Approachmentioning
confidence: 99%
“…However they also lead to spam due to fake positive or negative reviews. Authors [20] put forwards the development of online unsupervised spam learning and detection scheme. The learning algorithm is efficient in mining repeated occurrences of terms that are generated by templates and rarely seen in spam.…”
Section: Related Workmentioning
confidence: 99%