Abstract. In this paper, we present an accurate and realtime PE-Miner framework that automatically extracts distinguishing features from portable executables (PE) to detect zero-day (i.e. previously unknown) malware. The distinguishing features are extracted using the structural information standardized by the Microsoft Windows operating system for executables, DLLs and object files. We follow a threefold research methodology: (1) identify a set of structural features for PE files which is computable in realtime, (2) use an efficient preprocessor for removing redundancy in the features' set, and (3) select an efficient data mining algorithm for final classification between benign and malicious executables. We have evaluated PE-Miner on two malware collections, VX Heavens and Malfease datasets which contain about 11 and 5 thousand malicious PE files respectively. The results of our experiments show that PE-Miner achieves more than 99% detection rate with less than 0.5% false alarm rate for distinguishing between benign and malicious executables. PEMiner has low processing overheads and takes only 0.244 seconds on the average to scan a given PE file. Finally, we evaluate the robustness and reliability of PE-Miner under several regression tests. Our results show that the extracted features are robust to different packing techniques and PE-Miner is also resilient to majority of crafty evasion strategies.
Commercial anti-virus software are unable to provide protection against newly launched (a.k.a "zero-day") malware. In this paper, we propose a novel malware detection technique which is based on the analysis of byte-level file content. The novelty of our approach, compared with existing content based mining schemes, is that it does not memorize specific byte-sequences or strings appearing in the actual file content. Our technique is non-signature based and therefore has the potential to detect previously unknown and zero-day malware. We compute a wide range of statistical and information-theoretic features in a block-wise manner to quantify the byte-level file content. We leverage standard data mining algorithms to classify the file content of every block as normal or potentially malicious. Finally, we correlate the block-wise classification results of a given file to categorize it as benign or malware. Since the proposed scheme operates at the byte-level file content; therefore, it does not require any a priori information about the filetype. We have tested our proposed technique using a benign dataset comprising of six different filetypes -DOC, EXE, JPG, MP3, PDF and ZIP and a malware dataset comprising of six different malware types -backdoor, trojan, virus, worm, constructor and miscellaneous. We also perform a comparison with existing data mining based malware detection techniques. The results of our experiments show that the proposed nonsignature based technique surpasses the existing techniques and achieves more than 90% detection accuracy.
No abstract
In this paper, we evaluate the performance of ten well-known evolutionary and non-evolutionary rule learning algorithms. The comparative study is performed on a real-world classification problem of detecting malicious executables. The executable dataset, used in this study, consists of 189 attributes which are statically extracted from the executables of Microsoft Windows operating system. In our study, we compare the performance of rule learning algorithms with respect to four metrics: (1) classification accuracy, (2) the number of rules in the developed rule set, (3) the comprehensibility of the generated rules, and (4) the processing overhead of the rule learning process. The results of our comparative study suggest that evolutionary rule learning classifiers cannot be deployed in real-world malware detection systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.