Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets

Xu, Yanwu; Wu, Chunhua; Zheng, Kangfeng; Niu, Xinxin; Yang, Yixian

doi:10.1177/1550147717703116

Cited by 14 publications

(11 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Next, the generator is to approximately minimize the Wasserstein distance, which is equivalent to minimize L in formula (5). Considering that the first term of formula ( 5) is independent of the generator, we can get the discriminator loss and generator loss of WGAN in formulas ( 6) and (7).…”

Section: Deep Learning Model Trainingmentioning

confidence: 99%

“…Formula ( 6) is the inverse of formula (5), and formula ( 6) can indicate the training process. e smaller the value of formula ( 6) is, the smaller the Wasserstein distance between the real data and the generated data is, and the better WGAN training is.…”

Section: Deep Learning Model Trainingmentioning

confidence: 99%

“…For example, in network intrusion detection, most traffic issues are normal with very few abnormal ones [4]. In Android malware detection, the proportion of the malware APPs is relatively low [5]. In the network threat detection, if the malicious samples as minority class are misjudged and widely spread, it will bring big losses to users.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Oversampling Imbalanced Data Based on Convergent WGAN for Network Threat Detection

Zhang

Qiu

et al. 2021

Security and Communication Networks

Self Cite

View full text Add to dashboard Cite

Class imbalance is a common problem in network threat detection. Oversampling the minority class is regarded as a popular countermeasure by generating enough new minority samples. Generative adversarial network (GAN) is a typical generative model that can generate any number of artificial minority samples, which are close to the real data. However, it is difficult to train GAN, and the Nash equilibrium is almost impossible to achieve. Therefore, in order to improve the training stability of GAN for oversampling to detect the network threat, a convergent WGAN-based oversampling model called convergent WGAN (CWGAN) is proposed in this paper. The training process of CWGAN contains multiple iterations. In each iteration, the training epochs of the discriminator are dynamic, which is determined by the convergence of discriminator loss function in the last two iterations. When the discriminator is trained to convergence, the generator will then be trained to generate new minority samples. The experiment results show that CWGAN not only improve the training stability of WGAN on the loss smoother and closer to 0 but also improve the performance of the minority class through oversampling, which means that CWGAN can improve the performance of network threat detection.

show abstract

Section: Deep Learning Model Trainingmentioning

confidence: 99%

Section: Deep Learning Model Trainingmentioning

confidence: 99%

See 1 more Smart Citation

Oversampling Imbalanced Data Based on Convergent WGAN for Network Threat Detection

Zhang

Qiu

et al. 2021

Security and Communication Networks

Self Cite

View full text Add to dashboard Cite

show abstract

“…Similarly, in [33], random sampling fixed an offset between the number of minority classes by random generation, enabling a better detection ratio. The authors in [34] discussed that the repeated production and mixing of oversampled instances can create some disturbance in the original data distribution; instead, a series of SMOTE-based adaptations were adopted: In [10], the authors said that malware-based datasets are for the most part constituted by benign samples, requiring a feasible structure that considers the degree of the belongingness of each minority class, a decisive factor to properly balance a dataset. As part of the methodology, a set of algorithms (DT, RF, Naive Bayes-NB, SVM, and AdaBoost-AB) were trough-fed preprocessed inputs under a fuzzy-theory-based SMOTE technique, reducing misclassification costs to a large extent [35].…”

Section: Related Workmentioning

confidence: 99%

“…In supervised ML-based problems, when a certain number of classes are not equally distributed, the data are said to be unbalanced, impacting the algorithm capabilities to aptly learn from the samples of the predominant class, following to a degradation of classification performance. Considering [9,10], the credibility of an ML-based cyber-physical system depends on the predictive abilities of intelligent agents trained with a wide range of adversarial behavioral patterns, able to protect workloads against malicious activities. Indeed, data distribution has a great effect on the efficiency of different ML models [11,12]; thus, it is important to establish data-level strategies in the preprocessing and training stages.…”

Section: Introductionmentioning

confidence: 99%

Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets

et al. 2020

View full text Add to dashboard Cite

Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples can significantly vary. In highly unbalanced data, classification models regularly have high precision with respect to the majority class, while minority classes are considered noise due to the lack of information that they provide. Well-known datasets used for malware-based analyses like botnet attacks and Intrusion Detection Systems (IDS) mainly comprise logs, records, or network-traffic captures that do not provide an ideal source of evidence as a result of obtaining raw data. As an example, the numbers of abnormal and constant connections generated by either botnets or intruders within a network are considerably smaller than those from benign applications. In most cases, inadequate dataset design may lead to the downgrade of a learning algorithm, resulting in overfitting and poor classification rates. To address these problems, we propose a resampling method, the Synthetic Minority Oversampling Technique (SMOTE) with a grid-search algorithm optimization procedure. This work demonstrates classification-result improvements for botnet and IDS datasets by merging synthetically generated balanced data and tuning different supervised-learning algorithms.

show abstract

Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data

Liu

2021

Soft Comput

View full text Add to dashboard Cite

Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets

Cited by 14 publications

References 44 publications

Oversampling Imbalanced Data Based on Convergent WGAN for Network Threat Detection

Oversampling Imbalanced Data Based on Convergent WGAN for Network Threat Detection

Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets

Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data

Contact Info

Product

Resources

About