A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data

Zhao, Chensu; Yang, Xin; Li, Xuefeng; Yang, Yixian; Chen, Yuling

doi:10.3390/app10030936

Cited by 60 publications

(32 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To deal with the problem of class imbalance of spam detection in social networks, Zhao et al [68] proposed a heterogeneous stacking-based ensemble learning framework, which consists of two main modules: a base module and a combined module. In the base module, they trained six separate base classifiers to generate meta-data with new features, which are fed to the combined module.…”

Section: Shallow Learning-based Detection Methodsmentioning

confidence: 99%

Tweet-Based Bot Detection Using Big Data Analytics

et al. 2021

View full text Add to dashboard Cite

Section: Shallow Learning-based Detection Methodsmentioning

confidence: 99%

Tweet-Based Bot Detection Using Big Data Analytics

et al. 2021

View full text Add to dashboard Cite

“…Table 5 and Fig. 14 computes a detailed comparative results analysis of the CIDD-ADODNN model on the test Spam dataset [23][24][25]. The resultant scores reported that HELF and KNN models have depicted inferior performance by obtaining lower accuracy values of 0.750 and 0.818, respectively.…”

Section: Performance Validationmentioning

confidence: 99%

Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data

Priya

Uthra

2021

Complex Intell. Syst.

View full text Add to dashboard Cite

In present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.

show abstract

“…To prevent misclassifications, which can be fatal to the filter, we use a cost-based machine learning technique. This method sets a different cost for errors that occur in the case of misclassification and attempts to minimize the sum of costs [43,44]. The asymmetric classification cost matrix is presented in Table II.…”

Section: Figure 1 Workflow Of the Proposed Frameworkmentioning

confidence: 99%

“…If is a resample that has examples, is a model generated by applying the machine learning algorithm to . Here, the risk that results when belongs to class can be defined as follows [44]:…”

Section: Figure 1 Workflow Of the Proposed Frameworkmentioning

confidence: 99%

Cost-Based Heterogeneous Learning Framework for Real-Time Spam Detection in Social Networks With Expert Decisions

Choi

Jeon

2021

IEEE Access

View full text Add to dashboard Cite

With the widespread use of social networks, spam messages against them have become a major issue. Spam detection methods can be broadly divided into expert-based and machine learning-based detection methods. When experts participate in spam detection, the detection accuracy is fairly high. However, this method is highly time-consuming and expensive. Conversely, methods using machine learning have the advantage of automation, but their accuracy is relatively low. This paper proposes a spam-detection framework that combines and fully exploits the advantages of both methods. To reduce the workload of the experts, all messages are first analyzed via a primary machine learning filter, and those that are determined to be normal messages are allowed through, whereas suspicious messages are flagged. The flagged messages are subsequently analyzed by an expert to enhance the overall system accuracy. In the filtering process, costbased machine learning is used to prevent the fatal error of misidentifying a spam message as a normal message. In addition, to obviate the continuously evolving spam trends, a module that periodically updates the expert-diagnosis results on the training dataset is incorporated into the framework. The results of experiments conducted, on an imbalanced dataset of spam tweets and normal tweets in a ratio similar to the actual situation in real life, indicate that the proposed framework has a spam-detection rate of almost 92.8%, which is higher than that of the conventional machine learning technique. Furthermore, the proposed framework delivered stable high performance even in an environment where social network messages changed continuously, unlike the conventional technique, which exhibited large performance deviations.INDEX TERMS Expert decision making, machine learning, real-time spam detection, social network, Twitter spam.

show abstract

A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data

Cited by 60 publications

References 42 publications

Tweet-Based Bot Detection Using Big Data Analytics

Tweet-Based Bot Detection Using Big Data Analytics

Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data

Cost-Based Heterogeneous Learning Framework for Real-Time Spam Detection in Social Networks With Expert Decisions

Contact Info

Product

Resources

About