An Unsupervised Approach for Content-Based Clustering of Emails Into Spam and Ham Through Multiangular Feature Formulation

Karim, Asif; Azam, Sami; Shanmugam, Bharanidharan; Kannoorpatti, Krishnan

doi:10.1109/access.2021.3116128

Cited by 17 publications

(3 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The scientific community has agreed on a number of criteria for evaluating the classification system's quality [22]- [24]. The confusion matrix is used to assess the study's success using the following key parameters: true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) Validity metrics such as accuracy, sensitivity/recall, specificity, F1-score, precision/positive predicted value (PPV), negative predicted value (NPV), false-negative rate (FNR), false-positive rate (FPR), false discovery rate (FDR), false omission rate (FOR), and Matthews correlation coefficient (MCC) can be calculated using these parameters [25]- [30].…”

Section: Performance Evaluationmentioning

confidence: 99%

Analysing most efficient deep learning model to detect COVID-19 from computer tomography images

Shamrat

Chakraborty

Ahammad

et al. 2022

IJEECS

View full text Add to dashboard Cite

COVID-19 illness has a <span>detrimental impact on the respiratory system, and the severity of the infection may be determined utilizing a selected imaging technique. Chest computer tomography (CT) imaging is a reliable diagnostic technique for finding COVID-19 early and slowing its progression. Recent research shows that deep learning algorithms, particularly convolutional neural network (CNN), may accurately diagnose COVID-19 using lung CT scan images. But in an emergency, detection accuracy simply is not enough. Determinants of data loss and classification completion time play a critical element. This study addresses the issue by finding the most efficient CNN model with the least data loss and classification time. Eight deep learning models, including Max Pooling 2D, Average Pooling 2D, VGG19, VGG16, MobileNetV2, InceptionV3, AlexNet, NFNet using a dataset of 16000 CT scans image data of COVID-19 and non-COVID-19 are compared in the study. Using the confusion matrix, the performance of the models is compared and together with the data loss and completion time. It is observed from the research that MobileNetV2 provides the highest accurate result of 99.12% with the least data loss of 0.0504% in the lowest classification completion time of 16.5secs per epoch. Thus, employing MobileNetV2 gives the best and the quickest result in an emergency.</span>

show abstract

Section: Performance Evaluationmentioning

confidence: 99%

Analysing most efficient deep learning model to detect COVID-19 from computer tomography images

Shamrat

Chakraborty

Ahammad

et al. 2022

IJEECS

View full text Add to dashboard Cite

show abstract

“…The validation assures how good the clustering solutions are by their different ways of computations. This measure aims to measure and validate the clustering quality [28].…”

Section: B Parameter Settingsmentioning

confidence: 99%

Firefly Algorithm with Mini Batch K-Means Entropy Measure for Clustering Heterogeneous Categorical Timber Data

Mahfuz¹,

Yusoff²,

Nordin³

et al. 2022

IJACSA

View full text Add to dashboard Cite

Clustering analysis is the process of identifying similar patterns in various types of data. Heterogeneous categorical data consists of data on ordinal, nominal, binary, and Likert scales. The clustering solution for heterogeneous data clustering remains difficult due to partitioning complex and dissimilarity features. It is necessary to find a solution to highquality clustering techniques to efficiently determine the significant features of the data. This paper emphasizes using the firefly algorithm to reduce the distance gap between features and improve clustering performance. To obtain an optimal global solution for clustering, we proposed a hybrid of mini-batch kmeans (MBK) clustering-based entropy distance measures (EM) with a firefly optimization algorithm (FA). This study compares the performance of hybrid K-Means, Agglomerative, DBSCAN, and Affinity clustering models with EM and FA. The evaluation uses a variety of data from the timber perception survey dataset. In terms of performance, the proposed MBK+EM+FA has superior and most effective clustering. It achieves a higher accuracy of 96.3 percent, a 97 percent F-measure, a 98 percent precision, and a 97 percent recall. Other external assessments revealed that the Homogeneity (HOMO) is 79.14 percent, the Fowlkes-Mallows Index (FMI) is 93.07 percent, the Completeness (COMP) is 78.04 percent, and the V-Measure (VM) is 78.58 percent. Both proposed MBK+EM+FA and MBK+EM took about 0.45s and 0.35s to compute, respectively. The excellent quality of the clustering results does not justify such time constraints. Surprisingly, the proposed model reduced the distance measure of all heterogeneous features. The future model could put heterogeneous categorical data from a different domain to the test.

show abstract

“…Unsupervised, semi-supervised, and supervised machine learning techniques are the three types used, and in general, supervised learning performs better than the other techniques. Several Machine Learning Algorithms (MLA) can be employed for knowledge identification, including Naive Bayes (NB), Artificial Neural Networks (ANN), Support Vector Machines (SVM), and k-Nearest Neighbor (KNN) [10]- [13].…”

Section: Introductionmentioning

confidence: 99%

Feature Selection by Multiobjective Optimization: Application to Spam Detection System by Neural Networks and Grasshopper Optimization Algorithm

et al. 2022

View full text Add to dashboard Cite

Networks are strained by spam, which also overloads email servers and blocks mailboxes with unwanted messages and files. Setting the protective level for spam filtering might become even more crucial for email users when malicious steps are taken since they must deal with an increase in the number of valid communications being marked as spam. By finding patterns in email communications, spam detection systems (SDS) have been developed to keep track of spammers and filter email activity. SDS has also enhanced the tool for detecting spam by reducing the rate of false positives and increasing the accuracy of detection. The difficulty with spam classifiers is the abundance of features. The importance of feature selection (FS) comes from its role in directing the feature selection algorithm's search for ways to improve the SDS's classification performance and accuracy. As a means of enhancing the performance of the SDS, we use a wrapper technique in this study that is based on the multi-objective grasshopper optimization algorithm (MOGOA) for feature extraction and the recently revised EGOA algorithm for multilayer perceptron (MLP) training. The suggested system's performance was verified using the SpamBase, SpamAssassin, and UK-2011 datasets. Our research showed that our novel approach outperformed a variety of established practices in the literature by as much as 97.5%, 98.3%, and 96.4% respectively. INDEX TERMSSpam detection system (SDS), grasshopper optimization algorithm (GOA), feature selection (FS), multi-objective optimization (MOO), multilayer perceptron (MLP)

show abstract

An Unsupervised Approach for Content-Based Clustering of Emails Into Spam and Ham Through Multiangular Feature Formulation

Cited by 17 publications

References 44 publications

Analysing most efficient deep learning model to detect COVID-19 from computer tomography images

Analysing most efficient deep learning model to detect COVID-19 from computer tomography images

Firefly Algorithm with Mini Batch K-Means Entropy Measure for Clustering Heterogeneous Categorical Timber Data

Feature Selection by Multiobjective Optimization: Application to Spam Detection System by Neural Networks and Grasshopper Optimization Algorithm

Contact Info

Product

Resources

About