Malware classification using XGboost-Gradient Boosted Decision Tree

Kumar, Rajesh; Geetha, S.

doi:10.25046/aj050566

Cited by 31 publications

(8 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Different types of analysis based approaches have been suggested in the literature for determining malware categories [10]. Kumar et al [21] use the XGboost model for malware detection. It uses the Ember dataset in which there are 300k malicious and 300k non-malicious instances.…”

Section: Literature Reviewmentioning

confidence: 99%

Detection of Malware Attacks using Artiﬁcial Neural Network

Rana,

Minhaj Ahmad Khan

2023

VAWKUM trans. comput. sci.

View full text Add to dashboard Cite

Malware attacks are increasing rapidly as the technology continues to become prevalent. These attacks have become extremely difficult to detect as they continuously change their mechanism for exploitation of vulnerabilities in software. The conventional approaches to malware detection become ineffective due to a large number of varying patterns and sequences, thereby requiring artificial intelligence-based approaches for the detection of malware attacks. In this paper, we propose an artificial neural network-based model for malware detection. Our proposed model is generic as it can be applied to multiple datasets. We have compared our model with different machine-learning approaches. The experimentation results show that the proposed model can outperform other well-known approach as it achieves 99.6\% , 98.9\% and 99.9\% accuracy on the Windows API call dataset, Top PE Imports Dataset and Malware Dataset, respectively.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Detection of Malware Attacks using Artiﬁcial Neural Network

Rana,

Minhaj Ahmad Khan

2023

VAWKUM trans. comput. sci.

View full text Add to dashboard Cite

show abstract

“…Kumar and Geetha [23] proposed a malware classification scheme that constructs a model using low-end computing resources and a very large balanced dataset-the EMBER dataset, which consists of 1.1 million entries-for malware. The authors compared the performance of nine algorithms: Gaussian NB, KNN, linear support vector classification (SVC), DT, AdaBoost, RF, extra trees, gradient boost (GDB), and XGBoost.…”

Section: Reference Workmentioning

confidence: 99%

MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm

et al. 2022

View full text Add to dashboard Cite

This paper focuses on training machine learning models using the XGBoost and extremely randomized trees algorithms on two datasets obtained using static and dynamic analysis of real malicious and benign samples. We then compare their success rates—both mutually and with other algorithms, such as the random forest, the decision tree, the support vector machine, and the naïve Bayes algorithms, which we compared in our previous work on the same datasets. The best performing classification models, using the XGBoost algorithm, achieved 91.9% detection accuracy and 98.2% sensitivity, 0.853 AUC, and 0.949 F1 score on the static analysis dataset, and 96.4% accuracy and 98.5% sensitivity, 0.940 AUC, and 0.977 F1 score on the dynamic analysis dataset. Then, we exported the best performing machine learning models and used them in our proposed MLMD program, automating the process of static and dynamic analysis and allowing the trained models to be used for classification on new samples.

show abstract

“…In static analysis, the features of malware may be extracted from the PE header [ 12 ] or the Application Program Interface (API) calls from the loaded dynamic link library (DLL) [ 13 ]. Features for static analysis can also be extracted from software files, such as histogram of bytes in the sample, the entropy of parts of the sample file, and printable strings with more than five characters embedded in the sample file [ 14 ]. Raff et al [ 15 ] used n-grams from byte code for static analysis.…”

Section: Literature Surveymentioning

confidence: 99%

Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach

Kumar

Geetha

2022

Sensors

Self Cite

View full text Add to dashboard Cite

Software products from all vendors have vulnerabilities that can cause a security concern. Malware is used as a prime exploitation tool to exploit these vulnerabilities. Machine learning (ML) methods are efficient in detecting malware and are state-of-art. The effectiveness of ML models can be augmented by reducing false negatives and false positives. In this paper, the performance of bagging and boosting machine learning models is enhanced by reducing misclassification. Shapley values of features are a true representation of the amount of contribution of features and help detect top features for any prediction by the ML model. Shapley values are transformed to probability scale to correlate with a prediction value of ML model and to detect top features for any prediction by a trained ML model. The trend of top features derived from false negative and false positive predictions by a trained ML model can be used for making inductive rules. In this work, the best performing ML model in bagging and boosting is determined by the accuracy and confusion matrix on three malware datasets from three different periods. The best performing ML model is used to make effective inductive rules using waterfall plots based on the probability scale of features. This work helps improve cyber security scenarios by effective detection of false-negative zero-day malware.

show abstract

Malware classification using XGboost-Gradient Boosted Decision Tree

Cited by 31 publications

References 33 publications

Detection of Malware Attacks using Artiﬁcial Neural Network

Detection of Malware Attacks using Artiﬁcial Neural Network

MLMD—A Malware-Detecting Antivirus Tool Based on the XGBoost Machine Learning Algorithm

Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach

Contact Info

Product

Resources

About