The Application of LightGBM in Microsoft Malware Detection

Pan, Qiangjian; Tang, Weiliang; Yao, Siyue

doi:10.1088/1742-6596/1684/1/012041

Cited by 9 publications

(8 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…( 27 ), which introduces regularization and reduces the complexity of the Tree. LightGBM ( 28 ) is an improved GBDT framework model, which uses histogram segmentation algorithm to replace the traditional pre-sorting traversal algorithm, with faster parallel training speed and higher accuracy, and can effectively prevent over-fitting. ExtraTrees is an integrated learning algorithm, which contains many decision trees and the classification result is determined by the vote of many decision trees.…”

Section: Methodsmentioning

confidence: 99%

A preliminary screening system for diabetes based on in-car electronic nose

Weng

Liu

et al. 2023

Endocrine Connections

View full text Add to dashboard Cite

Studies have found differences in the concentration of volatile organic compounds in the breath of diabetics and healthy people, prompting attention to the use of devices such as electronic noses to detect diabetes. In this study, we explored the design of a non-invasive diabetes preliminary screening system that uses a homemade electronic nose sensor array to detect respiratory gas markers. In the algorithm part, two feature extraction methods were adopted, gradient boosting method was used to select promising feature subset, and then Particle Swarm Optimization (PSO) algorithm was introduced to extract 24 most effective features, which reduces the number of sensors by 56% and saves the system cost. Respiratory samples were collected from 120 healthy subjects and 120 diabetic subjects to assess the system performance. Random Forest (RF) algorithm was used to classify and predict electronic nose data, and the accuracy can reach 93.33%. Experimental results show that on the premise of ensuring accuracy, the system has low cost and small size after the number of sensors is optimized, and it is easy to install on in-car. It provides a more feasible method for the preliminary screening of diabetes on in-car, and can be used as an assistant to the existing detection methods.

show abstract

Section: Methodsmentioning

confidence: 99%

A preliminary screening system for diabetes based on in-car electronic nose

Weng

Liu

et al. 2023

Endocrine Connections

View full text Add to dashboard Cite

show abstract

“…The aim is to improve the efficiency and effectiveness of Windows malware detection. Our preliminary study revealed that the LightGBM technique which is the best of the GBDT algorithm, has proven to be suitable for Windows malware detection (Abbadi et al, 2020;Pan et al, 2020) and can be improved for effective and efficient malware detection. ML-based classifiers use underlying features to distinguish between malicious and benign applications, and detecting changes in those features when malicious modifies itself.…”

Section: Anomaly-based Detectionmentioning

confidence: 95%

“…The detection time of the model was not considered. Pan et al (2020) used Logistic Regression, KNN and LightGBM to build models based on datasets of heartbeat and threat reports. The results obtained from the respective models show that LightGBM has the highest accuracy with AUC of 0.720687.…”

Section: Anomaly-based Detectionmentioning

confidence: 99%

EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm

Onoja

Jegede

Blamah

et al. 2022

JCSI

View full text Add to dashboard Cite

The security threats posed by malware make it imperative to build a model for efficient and effective classification of malware based on its family, irrespective of the variant. Preliminary experiments carried out demonstrate the suitability of the generic LightGBM algorithm for Windows malware as well as its effectiveness and efficiency in terms of detection accuracy, training accuracy, prediction time and training time. The prediction time of the generic LightGBM is 0.08s for binary class and 0.40s for multi-class on the Malimg dataset. The classification accuracy of the generic LightGBM is 99% True Positive Rate (TPR). Its training accuracy is 99.80% for binary class and 96.87% for multi-class, while the training time is 179.51s and 2224.77s for binary and multi classification respectively. The performance of the generic LightGBM leaves room for improvement, hence, the need to improve the classification accuracy and training accuracy of the model for effective decision making and to reduce the prediction time and training time for efficiency. It is also imperative to improve the performance and accuracy for effectiveness on larger samples. The goal is to enhance the detection accuracy and reduce the prediction time. The reduction in prediction time provides early detection of malware before it damages files stored in computer systems. Performance evaluation based on Malimg dataset demonstrates the effectiveness and efficiency of the hybrid model. The proposed model is a hybrid model which integrates XceptionCNN with LightGBM algorithm for Windows Malware classification on google colab environment. It uses the Malimg malware dataset which is a benchmark dataset for Windows malware image classification. It contains 9,339 Malware samples, structured as grayscale images, consisting of 25 families and 1,042 Windows benign executable files extracted from Windows environments. The proposed XceptionCNN-LightGBM technique provides improved classification accuracy of 100% TPR, with an overall reduction in the prediction time of 0.08s and 0.37s for binary and multi-class respectively. These are lower than the prediction time for the generic LightGBM which is 0.08s for binary class and 0.40s for multi-class, with an improved 100% classification accuracy. The training accuracy increased to 99.85% for binary classification and 97.40% for multi classification, with reduction in the training time of 29.97s for binary classification and 447.75s for multi classification. These are also lower than the training times for the generic LightGBM model, which are 179.51s and 2224.77s for the binary and multi classification respectively. This significant reduction in the training time makes it possible for the model to converge quickly and train a large sum of data within a relatively short period of time. Overall, the reduction in detection time and improvement in detection accuracy will minimize damages to files stored in computer systems in the event of malware attack.

show abstract

“…To evaluate their work, most prior work used the area under the receiving operator characteristic curve (ROC), which is hereby referred to as the AUC score. Pan et al (2020) first preprocessed the aforementioned dataset to reduce the memory occupied by the dataset. This was done via the removal of columns that contained a > 95% proportion of null samples, switching several data types to less precise forms and converting several ordinal fields into nominal fields.…”

Section: Related Workmentioning

confidence: 99%

MALPRED: Predictive Modeling for Malware Detection in Windows Systems using Ensemble Learning

Yau,

Shah,

Ang

et al. 2023

Preprint

View full text Add to dashboard Cite

Malware infections are a pervasive issue for computers running the Windows operating system. In this study, we present a machine-learning based approach to predict the likelihood of malware infection in Windows machines. Our methodology involves conducting data pre-processing, feature engineering, and selection on the Microsoft Malware Prediction dataset. We then perform extensive experimentation using various machine learning algorithms and identify XGBoost, LightGBM and CatBoost as the 3 best-performing algorithms. Through hyperparameter tuning via the Tree-Structured Parzen Estimator and using a Meta Learner on top of our top 3 best-performing algorithms, our optimal novel model achieves an AUC score of 73.24\% across Stratified 5-fold cross-validation, demonstrating the efficacy of our approach.

show abstract

The Application of LightGBM in Microsoft Malware Detection

Cited by 9 publications

References 6 publications

A preliminary screening system for diabetes based on in-car electronic nose

A preliminary screening system for diabetes based on in-car electronic nose

EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm

MALPRED: Predictive Modeling for Malware Detection in Windows Systems using Ensemble Learning

Contact Info

Product

Resources

About