2017
DOI: 10.1177/1550147717703116
|View full text |Cite
|
Sign up to set email alerts
|

Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets

Abstract: In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy-synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversamp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 44 publications
0
11
0
Order By: Relevance
“…Next, the generator is to approximately minimize the Wasserstein distance, which is equivalent to minimize L in formula (5). Considering that the first term of formula ( 5) is independent of the generator, we can get the discriminator loss and generator loss of WGAN in formulas ( 6) and (7).…”
Section: Deep Learning Model Trainingmentioning
confidence: 99%
See 2 more Smart Citations
“…Next, the generator is to approximately minimize the Wasserstein distance, which is equivalent to minimize L in formula (5). Considering that the first term of formula ( 5) is independent of the generator, we can get the discriminator loss and generator loss of WGAN in formulas ( 6) and (7).…”
Section: Deep Learning Model Trainingmentioning
confidence: 99%
“…Formula ( 6) is the inverse of formula (5), and formula ( 6) can indicate the training process. e smaller the value of formula ( 6) is, the smaller the Wasserstein distance between the real data and the generated data is, and the better WGAN training is.…”
Section: Deep Learning Model Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…Similarly, in [33], random sampling fixed an offset between the number of minority classes by random generation, enabling a better detection ratio. The authors in [34] discussed that the repeated production and mixing of oversampled instances can create some disturbance in the original data distribution; instead, a series of SMOTE-based adaptations were adopted: In [10], the authors said that malware-based datasets are for the most part constituted by benign samples, requiring a feasible structure that considers the degree of the belongingness of each minority class, a decisive factor to properly balance a dataset. As part of the methodology, a set of algorithms (DT, RF, Naive Bayes-NB, SVM, and AdaBoost-AB) were trough-fed preprocessed inputs under a fuzzy-theory-based SMOTE technique, reducing misclassification costs to a large extent [35].…”
Section: Related Workmentioning
confidence: 99%
“…In supervised ML-based problems, when a certain number of classes are not equally distributed, the data are said to be unbalanced, impacting the algorithm capabilities to aptly learn from the samples of the predominant class, following to a degradation of classification performance. Considering [9,10], the credibility of an ML-based cyber-physical system depends on the predictive abilities of intelligent agents trained with a wide range of adversarial behavioral patterns, able to protect workloads against malicious activities. Indeed, data distribution has a great effect on the efficiency of different ML models [11,12]; thus, it is important to establish data-level strategies in the preprocessing and training stages.…”
Section: Introductionmentioning
confidence: 99%