Feature Selection and Machine Learning Classification for Malware Detection

Khammas, Ban Mohammed; Monemi, Alireza; Bassi, Joseph Stephen; Ismail, Idris; Nor, Sulaiman Mohd; Marsono, Muhammad Nadzir

doi:10.11113/jt.v77.3558

Cited by 36 publications

(18 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed method extracts n-gram features from the content of the file, and then filters the huge number of n-gram features. Snort sub-signature is used as a first stage filtering process and only the features that exist in Snort sub-signature which are different from the previous work [22] are selected. A second filter stage, the feature selection method has been used.…”

Section: Proposed Malware Detection Methodsmentioning

confidence: 99%

Malware Detection using Sub-Signatures and Machine Learning Technique

Khammas¹

2018

Journal of Information Security Research

View full text Add to dashboard Cite

Malware is a major computer security concern as many computing systems are connected to the Internet. The number of malware has increased over the years and new malware has emerged, where new variants are capable of evading conventional system detection through obfuscations. One of the promising methods used to detect malware is machine learning (ML) techniques. This work presents a static malware detection system using n-gram and machine learning techniques, using known malware subsignatures to reduce large feature search spaces, which are generated due to n-gram feature extraction methods. The feature space directly affects the performance and the detection accuracy of malware ML classifiers. Analysis of multiple feature selection methods to minimize the number of features and analysis of multiple ML classifiers are also presented to improve the malware detection accuracy. The results show that analyzing n-gram with Snort sub-signature features using machine learning give good malware detection accuracy of more than 99.78% and zero FPR when 4-gram features are used for most of the verified ML classifiers.

show abstract

Section: Proposed Malware Detection Methodsmentioning

confidence: 99%

Malware Detection using Sub-Signatures and Machine Learning Technique

Khammas¹

2018

Journal of Information Security Research

View full text Add to dashboard Cite

show abstract

“…This process is repeated k times. Finally, the average of k results is calculated to determine classifier performance [44]. In this study, k was selected as 10.…”

Section: Validation Of Classifiersmentioning

confidence: 99%

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

2021

View full text Add to dashboard Cite

Machine learning is widely used as a panacea in many engineering applications including the condition assessment of power transformers. Most statistics attribute the main cause of transformer failure to insulation degradation. Thus, a new, simple, and effective machine-learning approach was proposed to monitor the condition of transformer oils based on some aging indicators. The proposed approach was used to compare the performance of two machine-learning classifiers: J48 decision tree and random forest. The service-aged transformer oils were classified into four groups: the oils that can be maintained in service, the oils that should be reconditioned or filtered, the oils that should be reclaimed, and the oils that must be discarded. From the two algorithms, random forest exhibited a better performance and high accuracy with only a small amount of data. Good performance was achieved through not only the application of the proposed algorithm but also the approach of data preprocessing. Before feeding the classification model, the available data were transformed using the simple k-means method. Subsequently, the obtained data were filtered through correlation-based feature selection (CFsSubset). The resulting features were again retransformed by conducting the principal component analysis and were passed through the CFsSubset filter. The transformation and filtration of the data improved the classification performance of the adopted algorithms, especially random forest. Another advantage of the proposed method is the decrease in the number of the datasets required for the condition assessment of transformer oils, which is valuable for transformer condition monitoring.

show abstract

“…For this reason, the information gain of each feature is experimented. In order to experiment the information gain each used feature provides, Cor-relationAttributeEval attribute evaluator, which evaluates the worth of an attribute by measuring the correlation between it and the class [12,20,42,43,56], is used with the Ranker search method. As the experimental results listed in Table 3 show the novel feature number of lines of code provides the best information gain.…”

Section: Feature Selectionmentioning

confidence: 99%

What Static Analysis Can Utmost Offer for Android Malware Detection

Kabakuş

2019

ITC

View full text Add to dashboard Cite

Malicious applications are widespread for Android despite the taken serious actions by the operating system. Static and dynamic analysis techniques are utilized to detect malware by identifying the signatures of malicious applications by inspecting both the resources and behaviors of malware, respectively. In this study, what static analysis can utmost offer to detect malware in Android ecosystem is discussed and experimented on commonly used datasets in the literature by proposing a novel Android malware detection approach based on static analysis techniques. Some novel static analysis features which are proved to be effective in terms of detecting malware in Android ecosystem and are underestimated by the related work in the literature are introduced by proving their effectiveness in this study. The experimental result shows that the proposed Android malware detection approach is very effective in terms of detecting Android malware. Each feature used by the proposed approach is evaluated by using different types of machine learning techniques in order to highlight its impact on detecting malware and inform the digital investigators. The accuracy of the proposed static analysis approach is calculated as high as 0.987 for 10,865 applications.

show abstract

Feature Selection and Machine Learning Classification for Malware Detection

Cited by 36 publications

References 29 publications

Malware Detection using Sub-Signatures and Machine Learning Technique

Malware Detection using Sub-Signatures and Machine Learning Technique

Transformer Oil Quality Assessment Using Random Forest with Feature Engineering

What Static Analysis Can Utmost Offer for Android Malware Detection

Contact Info

Product

Resources

About