The ability to detect metamorphic malware has generated significant research interest over recent years, particularly given its proliferation on mobile devices. Such malware is particularly hard to detect via signature-based intrusion detection systems due to its ability to change its code over time. This article describes a novel framework which generates sets of potential mutants and then uses them as training data to inform the development of improved detection methods (either in two separate phases or in an adversarial learning setting). We outline a method to implement the mutant generation step using an evolutionary algorithm, providing preliminary results that show that the concept is viable as the first steps towards instantiation of the full framework.
In this paper, the effect of feature selection in malware detection using machine learning techniques is studied. We employ supervised and unsupervised machine learning algorithms with and without feature selection. These include both classification and clustering algorithms. The algorithms are compared for effectiveness and efficiency using their predictive accuracy, among others, as performance metric. From the studies, we observe that the best detection rate was attained for supervised learning with feature selection. The supervised learning algorithm used was Multilayer Perceptron (MLP) algorithm. The analysis also reveals that our system can detect viruses from varying sources. CCS Concepts• Computing methodologies➝Machine learning; Feature selection • Security and privacy➝Malware and its mitigation.
In the field of metamorphic malware detection, training a detection model with malware samples that reflect potential mutants of the malware is crucial in developing a model resistant to future attacks. In this paper, we use a Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm to generate a large set of novel, malicious mutants that are diverse with respect to their behavioural and structural similarity to the original mutant. Using two classes of malware as a test-bed, we show that the MAP-Elites algorithm produces a large and diverse set of mutants, that evade between 64% to 72% of the 63 detection engines tested. When compared to results obtained using repeated runs of an Evolutionary Algorithm that converges to a single solution result, the MAP-Elites approach is shown to produce a significantly more diverse range of solutions, while providing equal or improved results in terms of evasiveness, depending on the dataset in question. In addition, the archive produced by MAP-Elites sheds insight into the properties of a sample that lead to them being undetectable by a suite of existing detection engines.
The proliferation of metamorphic malware has recently gained a lot of research interest. This is because of their ability to transform their program codes stochastically. Several detectors are unable to detect this malware family because of how quickly they obfuscate their code.It has also been shown that Machine learning (ML) models are not robust to these attacks due to the insufficient data to train these models resulting from the constant code mutation of metamorphic malware. Although recent studies have shown how to generate samples of metamorphic malware to serve as training data, this process can be computationally expensive. One way to improve the performance of these ML models is to transfer learning from other fields which have robust models such as what has been done with the transfer of learning from computer vision and image processing to improve malware detection. In this work, we introduce an evolutionary based transfer learning approach that uses evolved mutants of malware generated using a traditional Evolutionary Algorithm (EA) as well as models from Natural Language Processing (NLP) text classification to improve the classification of metamorphic malware. Our preliminary results demonstrate that using NLP models can improve the classification of metamorphic malware in some instances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.