Achieving accurate and efficient metamorphic malware detection remains a challenge. Metamorphic malware is able to mutate and alter its code structure in each infection that can circumvent signature matching detection. However, some vital functionalities and code segments remain unchanged between mutations. We exploit these unchanged features by the mean of classification using Support Vector Machine (SVM). N-gram features are extracted directly from malware binaries to avoid disassembly, which these features are then masked with the extracted known malware signature n-grams. These masked features reduce the number of selected n-gram features considerably. Our method is capable to accurately detect metamorphic malware with~99% accuracy and low false positive rate. The proposed method is also superior to commercially available anti-viruses for detecting metamorphic malware.
Keyword: SVM classification, Metamorphic, n-gram, SnortCopyright © 2016 Universitas Ahmad Dahlan. All rights reserved.
IntroductionMalware is one of security attacks to Internet users as it breaches computer security and data confidentiality, which are categorized into general (non-mutable) and mutable types. Antivirus softwares rely on signature-based detection as the primary detection mechanism. Mutatable malware such as packing, polymorphic, and metamorphic make the detection based on signature matching difficult. Metamorphic malwares mutate and change their codes structure and signatures in each infection that is difficult to detect [1]. Lately, several host-based dynamical analysis techniques were proposed for metamorphic malware detection [2]. However, these techniques require separate environment to analyze malware in order to be able to be detected. At the same time, the requirement of binary code disassembly in opcode-based methods [3][4][5] is not suitable for timely metamorphic detection on host-level intrusion detection systems.We propose metamorphic malware detection based on static analysis of metamorphic malware binaries without disassembly. Features are extracted from binary, which can be in the form of packets payload in network detection system or files in host based detection system using n-gram feature extraction and machine learning SVM classification. Besides, extracted ngram features are masked with known malware signature n-grams to represent only informative malware features. This technique can reduce the n-gram search space. This paper is organized as follows. Section 2 provides a critical review on relevant literatures. The methodology for metamorphic malware detection in network and host-based IDS are described in Section 3. Section 4 highlights the experimental setup, datasets, and evaluation criteria. The data analysis and comparison with commercial anti-virus software are presented in Section 5. Section 6 concludes the research findings, and contributions of the paper.