2021
DOI: 10.1109/access.2021.3116128
|View full text |Cite
|
Sign up to set email alerts
|

An Unsupervised Approach for Content-Based Clustering of Emails Into Spam and Ham Through Multiangular Feature Formulation

Abstract: The rapid growth of spam email attacks and the inherent malicious dynamism within those attacks on a range of social, personal and business activities warrants an intelligent and automated anti-spam framework. Attempts like malware propagation, identity theft, sensitive data pilfering, monetary as well as reputational damage are sharply increasing, endangering the privacy of the victim. Current solutions that are rather incomplete when the multidimensional feature range of email, is taken into account. We beli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(3 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…The scientific community has agreed on a number of criteria for evaluating the classification system's quality [22]- [24]. The confusion matrix is used to assess the study's success using the following key parameters: true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) Validity metrics such as accuracy, sensitivity/recall, specificity, F1-score, precision/positive predicted value (PPV), negative predicted value (NPV), false-negative rate (FNR), false-positive rate (FPR), false discovery rate (FDR), false omission rate (FOR), and Matthews correlation coefficient (MCC) can be calculated using these parameters [25]- [30].…”
Section: Performance Evaluationmentioning
confidence: 99%
“…The scientific community has agreed on a number of criteria for evaluating the classification system's quality [22]- [24]. The confusion matrix is used to assess the study's success using the following key parameters: true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) Validity metrics such as accuracy, sensitivity/recall, specificity, F1-score, precision/positive predicted value (PPV), negative predicted value (NPV), false-negative rate (FNR), false-positive rate (FPR), false discovery rate (FDR), false omission rate (FOR), and Matthews correlation coefficient (MCC) can be calculated using these parameters [25]- [30].…”
Section: Performance Evaluationmentioning
confidence: 99%
“…The validation assures how good the clustering solutions are by their different ways of computations. This measure aims to measure and validate the clustering quality [28].…”
Section: B Parameter Settingsmentioning
confidence: 99%
“…Unsupervised, semi-supervised, and supervised machine learning techniques are the three types used, and in general, supervised learning performs better than the other techniques. Several Machine Learning Algorithms (MLA) can be employed for knowledge identification, including Naive Bayes (NB), Artificial Neural Networks (ANN), Support Vector Machines (SVM), and k-Nearest Neighbor (KNN) [10]- [13].…”
Section: Introductionmentioning
confidence: 99%