Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Malware developers often employ code obfuscation techniques to conceal their malicious functionality, making it challenging to detect and analyze such software. While various de-obfuscation techniques exist, the majority of them require prior knowledge of the obfuscation tools and techniques in use. Identifying the specific obfuscation tools or algorithms applied to the obfuscated code is thus of vital importance, which, however, typically demands in-depth expert knowledge and substantial efforts. Therefore, this paper presents DeObA, a deep learning (DL) driven approach for the precise and efficient detection of obfuscation algorithms on the fine-grained function-level code snippets. To comprehensively capture unique patterns or features of different obfuscation algorithms from code, DeObA works on multiple distinct code views, encompassing token sequences, abstract syntax trees (AST) and program dependency graphs (PDG), which will reflect the code’s lexical morphology, syntactic and structural aspects. After individually collecting obfuscation-indicative features with well-matched DL encoder from each code view, a self-attention-based fusion strategy is performed on these features to produce an integrated, dense, yet feature-rich vector. This vector is then fed into a softmax classification layer for prediction. Due to the lack of a moderately sized dataset, a large obfuscation corpus is curated with 7 different obfuscation tools and a total of 12 obfuscation algorithms on 39,070 C/C[Formula: see text] functions. The experimental evaluations conducted on the dataset exhibit a distinguished detection performance of DeObA, which achieve accuracy rates of 99.90% and 99.19% on the obfuscation tool detection and obfuscation algorithm detection tasks, respectively. The ablation study also confirms the active role of considering multiple distinct code views and the effectiveness of the designed self-attention-based fusion strategy.
Malware developers often employ code obfuscation techniques to conceal their malicious functionality, making it challenging to detect and analyze such software. While various de-obfuscation techniques exist, the majority of them require prior knowledge of the obfuscation tools and techniques in use. Identifying the specific obfuscation tools or algorithms applied to the obfuscated code is thus of vital importance, which, however, typically demands in-depth expert knowledge and substantial efforts. Therefore, this paper presents DeObA, a deep learning (DL) driven approach for the precise and efficient detection of obfuscation algorithms on the fine-grained function-level code snippets. To comprehensively capture unique patterns or features of different obfuscation algorithms from code, DeObA works on multiple distinct code views, encompassing token sequences, abstract syntax trees (AST) and program dependency graphs (PDG), which will reflect the code’s lexical morphology, syntactic and structural aspects. After individually collecting obfuscation-indicative features with well-matched DL encoder from each code view, a self-attention-based fusion strategy is performed on these features to produce an integrated, dense, yet feature-rich vector. This vector is then fed into a softmax classification layer for prediction. Due to the lack of a moderately sized dataset, a large obfuscation corpus is curated with 7 different obfuscation tools and a total of 12 obfuscation algorithms on 39,070 C/C[Formula: see text] functions. The experimental evaluations conducted on the dataset exhibit a distinguished detection performance of DeObA, which achieve accuracy rates of 99.90% and 99.19% on the obfuscation tool detection and obfuscation algorithm detection tasks, respectively. The ablation study also confirms the active role of considering multiple distinct code views and the effectiveness of the designed self-attention-based fusion strategy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.