API Call-Based Malware Classification Using Recurrent Neural Networks

Li, Chen; Zheng, Junjun

doi:10.13052/jcsm2245-1439.1036

Cited by 18 publications

(9 citation statements)

References 29 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their pioneering work underscores the substantial accomplishments attained through the integration of deep learning techniques within API-sequence-based malware classification. In the same way, C Li's work [20] also demonstrates the RNN's ability to classify the API call sequences alone. In a subsequent development, Li et al [21] have further refined the network architecture, introducing the extraction of inherent features from API sequences.…”

Section: Deep Learning-based or Api-call-related Malware Classificationmentioning

confidence: 82%

“…The first five methods are classic methods [14,[44][45][46][47] to do the malware family classification, and we report the results from their papers. The following five methods [16,20,21,23,48] are the latest effective work on the classification based on API calls, so we reproduce the methods and offer a convincing comparison result. The [21] method adopts a two-way feature extraction architecture for API calls, but the core module is a multi-layer CNN, and the correlation analysis is performed through Bi-LSTM.…”

Section: Comparison With Previous Methodsmentioning

confidence: 99%

See 1 more Smart Citation

TTDAT: Two-Step Training Dual Attention Transformer for Malware Classification Based on API Call Sequences

Wang,

Lin,

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

The surge in malware threats propelled by the rapid evolution of the internet and smart device technology necessitates effective automatic malware classification for robust system security. While existing research has primarily relied on some feature extraction techniques, issues such as information loss and computational overhead persist, especially in instruction-level tracking. To address these issues, this paper focuses on the nuanced analysis of API (Application Programming Interface) call sequences between the malware and system and introduces TTDAT (Two-step Training Dual Attention Transformer) for malware classification. TTDAT utilizes Transformer architecture with original multi-head attention and an integrated local attention module, streamlining the encoding of API sequences and extracting both global and local patterns. To expedite detection, we introduce a two-step training strategy: ensemble Transformer models to generate class representation vectors, thereby bolstering efficiency and adaptability. Our extensive experiments demonstrate TTDAT’s effectiveness, showcasing state-of-the-art results with an average F1 score of 0.90 and an accuracy of 0.96.

show abstract

Section: Deep Learning-based or Api-call-related Malware Classificationmentioning

confidence: 82%

Section: Comparison With Previous Methodsmentioning

confidence: 99%

TTDAT: Two-Step Training Dual Attention Transformer for Malware Classification Based on API Call Sequences

Wang,

Lin,

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…The benchmark dataset was imbalanced in some malware families, such as Adware and Spyware. Hence, accuracy evaluation was not enough to identify the best classifier and make fair comparisons with other research [28,30]; the same evaluation metrics, precision, recall, and F1 score were used to present the results.…”

Section: Resultsmentioning

confidence: 99%

“…RNN is highly efficient at processing time series sequences, especially in the natural language processing field. Li et al [28] presented a classification model for malware families using the RNN model. Long API call sequences are used as classification features for variants of malware.…”

Section: Related Workmentioning

confidence: 99%

Features Engineering for Malware Family Classification Based API Call

2022

View full text Add to dashboard Cite

Malware is used to carry out malicious operations on networks and computer systems. Consequently, malware classification is crucial for preventing malicious attacks. Application programming interfaces (APIs) are ideal candidates for characterizing malware behavior. However, the primary challenge is to produce API call features for classification algorithms to achieve high classification accuracy. To achieve this aim, this work employed the Jaccard similarity and visualization analysis to find the hidden patterns created by various malware API calls. Traditional machine learning classifiers, i.e., random forest (RF), support vector machine (SVM), and k-nearest neighborhood (KNN), were used in this research as alternatives to existing neural networks, which use millions of length API call sequences. The benchmark dataset used in this study contains 7107 samples of API call sequences (labeled to eight different malware families). The results showed that RF with the proposed API call features outperformed the LSTM (long short-term memory) and gated recurrent unit (GRU)-based methods against overall evaluation metrics.

show abstract

“…Eliminating redundant APIs from malware API sequences has proven effective [ 29 , 30 , 31 , 32 ]. Our research used the following three commonly used methods to remove duplicate calls.…”

Section: Proposed Methodsmentioning

confidence: 99%

Channel Features and API Frequency-Based Transformer Model for Malware Identification

Qian,

Cong

2024

Sensors

View full text Add to dashboard Cite

Malicious software (malware), in various forms and variants, continues to pose significant threats to user information security. Researchers have identified the effectiveness of utilizing API call sequences to identify malware. However, the evasion techniques employed by malware, such as obfuscation and complex API call sequences, challenge existing detection methods. This research addresses this issue by introducing CAFTrans, a novel transformer-based model for malware detection. We enhance the traditional transformer encoder with a one-dimensional channel attention module (1D-CAM) to improve the correlation between API call vector features, thereby enhancing feature embedding. A word frequency reinforcement module is also implemented to refine API features by preserving low-frequency API features. To capture subtle relationships between APIs and achieve more accurate identification of features for different types of malware, we leverage convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. Experimental results demonstrate the effectiveness of CAFTrans, achieving state-of-the-art performance on the mal-api-2019 dataset with an F1 score of 0.65252 and an AUC of 0.8913. The findings suggest that CAFTrans improves accuracy in distinguishing between various types of malware and exhibits enhanced recognition capabilities for unknown samples and adversarial attacks.

show abstract

API Call-Based Malware Classification Using Recurrent Neural Networks

Cited by 18 publications

References 29 publications

TTDAT: Two-Step Training Dual Attention Transformer for Malware Classification Based on API Call Sequences

TTDAT: Two-Step Training Dual Attention Transformer for Malware Classification Based on API Call Sequences

Features Engineering for Malware Family Classification Based API Call

Channel Features and API Frequency-Based Transformer Model for Malware Identification

Contact Info

Product

Resources

About