Static analysis and detection of malware is a crucial phase for handling security threats. Most researchers stated that the problem with the static analysis is an imbalance in the dataset, causing invalid result metrics. It requires more time for extracting features from the raw binaries, and methods like neural networks require more time for the training. Considering these problems, we proposed a model capable of building a feature set from the dataset and classifying static PE files efficiently. The research work was conducted to emphasize the importance of feature extraction rather than focusing on model building. The well-extracted features help to provide better results when fed to neural networks with minimal numbers of layers. Using minimum layers will enhance the performance of the model and take fewer resources and time for the processing and evaluation. In this research work, EMBER datasets published by Endgame Inc. containing PE file information are used. Feature extraction, data standardization, and data cleaning techniques are performed to handle the imbalance and impurities from the dataset. Later the extracted features were scaled into a standard form to avoid the problems related to range variations. A total of 2381 features are extracted and pre-processed from both the 2017 and 2018 datasets, respectively. The pre-processed data is then given to a deep learning model for training. The deep learning model created using dense and dropout layers to minimize the resource strain on the model and deliver more accurate results in less amount of time. The results obtained during experimentation for EMBER v2017 and v2018 datasets are 97.53% and 94.09%, respectively. The model is trained for ten epochs with a learning rate of 0.01, and it took 4 minutes/epoch, which is one minute lesser than the Decision Tree model. In terms of precision metrics, our model achieved 98.85%, which is 1.85% more as compared to the existing models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.