Priyanka Singh scite author profile

Concurrency and Computation

Sharma

et al. 2022

Summary To deal with the huge amount of data, minimizing the overhead will play a key role in speedy and efficient malware detection. We propose a machine learning (ML) malware detection model with preprocessing to limit the feature overhead. The portable‐executable (PE) header information that retains meaningful and distinctive information has been considered to classify benign and malware files. The dataset is preprocessed by applying transformation, outlier detection and filling, and smoothing techniques. A maximum relevance minimum redundancy‐based feature selection method is deployed to assign the rank and score to each feature retaining the maximum relevant and minimal redundant information. Based on the obtained rank, many subsets of features have been created and investigated against support vector machine (SVM) and k‐nearest neighbors (k‐NN) with parametric tuning. The proposed ML model integrated with data preprocessing, feature selection, and SVM‐polynomial classifier has superior performance. This model is eliminating 63.8% feature overhead with accuracy above 99.1% for the benchmark datasets. To examine the robustness of the proposed model, new balanced and imbalanced datasets are created using new malware. The test results are encouraging with accuracy and specificity above 96.68%, 97.65%, and 91.57%, respectively. Interestingly, the proposed model is not trained using the newly created dataset.

show abstract

Performance Enhancement of SVM-based ML Malware Detection Model Using Data Preprocessing

Kumar

2022

Feed-Forward Deep Neural Network (FFDNN)-Based Deep Features for Static Malware Detection

International Journal of Intelligent Systems

Sarkar³

et al. 2023

The portable executable header (PEH) information is commonly used as a feature for malware detection systems to train and validate machine learning (ML) or deep learning (DL) classifiers. We propose to extract the deep features from the PEH information through hidden layers of a feed-forward deep neural network (FFDNN). The extraction of deep features of hidden layers represents the dataset with a better generalization for malware detection. While feeding the deep feature of one hidden layer to the succeeding layer, the Gaussian error linear unit (GeLU) activation function is applied. The FFDNN is trained with the GeLU activation function using the deep features of individual layers as well as concatenated deep features of all hidden layers. Similarly, the ML classifiers are also trained and validated in with individual layer deep features and concatenated features. Three highly effective ML classifiers, random forest (RF), support vector machine (SVM), and k-nearest neighbour (k-NN) have been investigated. The performance of the proposed model is demonstrated using a statically significant large dataset. The obtained results are interesting and encouraging in terms of classification accuracy. The classification accuracy reaches 99.15% with the internal discriminative deep feature for the proposed FFDNN-ML classifier with the GeLU activation function.

show abstract

Investigation and pre-processing of CLaMP mlaware dataset for machine learning models

Kumar

2022