The development of new technologies has caused computers one of the most popular electronic products. However, there is always a number of people who intend to take advantages of others through attacking others’ computers. To avoid property damage as much as possible, a precise and efficient detection is essential. This work uses the dataset which was generated by combining heartbeat and threat reports collected by Microsoft’ s endpoint protection solution to find out an effective solution. Since the dataset is large and has many categorical variables, reduction of memory and label encoding are used in data cleaning. Further, to handle the dimension problem and improve training efficiency, Chi-square testing is applied, and the top 42 fields are selected. Then, three algorithms (Logistic Regression, KNN and LightGBM) are chosen to build models and results are got respectively. The results show that LightGBM model achieves the best accuracy that AUC reaches 0.720687, and it is the most time-saving way. To the end, according to the feature importance from LightGBM algorithm, this work pick top-three important variables to analyze the underlying causes in the malware attack. One of the results reveals that the computer which has anti-virus software with bugs or pitfalls will suffer more attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.