Obfuscation is a mechanism used to hinder reverse engineering of programs. To cope with the large number of obfuscated programs, especially malware, reverse engineers automate the process of deobfuscation i.e. extracting information from obfuscated programs. Deobfuscation techniques target specific obfuscation transformations, which requires reverse engineers to manually identify the transformations used by a program, in what is known as metadata recovery attack. In this paper, we present Oedipus, a Python framework that uses machine learning classifiers viz., decision trees and naive Bayes, to automate metadata recovery attacks against obfuscated programs. We evaluated Oedipus' performance using two datasets totaling 1960 unobfuscated C programs, which were used to generate 11.075 programs obfuscated using 30 configurations of 6 different obfuscation transformations. Our results empirically show the feasibility of using machine learning to implement the metadata recovery attacks with classification accuracies of 100% in some cases.
The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies (e.g., if 10 or more scanners deem an app malicious, it is considered malicious). While some of the utilized thresholds may be able to accurately approximate the ground truths of apps, the fact that VirusTotal changes the set and versions of the scanners it uses makes such thresholds unsustainable over time. We implemented a method, Maat , that tackles these issues of standardization and sustainability by automatically generating a Machine Learning ( ML )-based labeling scheme, which outperforms threshold-based labeling strategies. Using the VirusTotal scan reports of 53K Android apps that span 1 year, we evaluated the applicability of Maat ’s Machine Learning ( ML )-based labeling strategies by comparing their performance against threshold-based strategies. We found that such ML -based strategies (a) can accurately and consistently label apps based on their VirusTotal scan reports, and (b) contribute to training ML -based detection methods that are more effective at classifying out-of-sample apps than their threshold-based counterparts.
Program obfuscation is increasingly popular among malware creators. Objectively comparing different malware detection approaches with respect to their resilience against obfuscation is challenging. To the best of our knowledge, there is no common empirical framework for evaluating the resilience of malware detection approaches w.r.t. behavior obfuscation. We propose and implement such a framework, called FEEBO that obfuscates the observable behavior of malware binaries targeting Microsoft Windows operating systems. To assess the framework's utility, we use it to obfuscate known malware binaries and then investigate the impact on detection effectiveness of different popular behavior based malware detection approaches. We find that the obfuscation transformations employed by FEEBO can affect the accuracy of such detection approaches significantly. FEEBO is therefore an effective and fair way to test the degree of resilience of behavior-based malware detection approaches against behavior obfuscation.
In training their newly-developed malware detection methods, researchers rely on threshold-based labeling strategies that interpret the scan reports provided by online platforms, such as VirusTotal. The dynamicity of this platform renders those labeling strategies unsustainable over prolonged periods, which leads to inaccurate labels. Using inaccurately labeled apps to train and evaluate malware detection methods significantly undermines the reliability of their results, leading to either dismissing otherwise promising detection approaches or adopting intrinsically inadequate ones. The infeasibility of generating accurate labels via manual analysis and the lack of reliable alternatives force researchers to utilize VirusTotal to label apps. In the paper, we tackle this issue in two manners. Firstly, we reveal the aspects of VirusTotal's dynamicity and how they impact thresholdbased labeling strategies and provide actionable insights on how to use these labeling strategies given VirusTotal's dynamicity reliably. Secondly, we motivate the implementation of alternative platforms by (a) identifying VirusTotal limitations that such platforms should avoid, and (b) proposing an architecture of how such platforms can be constructed to mitigate VirusTotal's limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.