Malicious JavaScript is one of the most common tools for attackers to exploit the vulnerability of web applications. It can carry potential risks such as spreading malware, phishing, or collecting sensitive information. Though there are numerous types of malicious JavaScript that are difficult to detect, generalizing the malicious script's signature can help catch more complex JavaScripts that use obfuscation techniques. This paper aims at detecting malicious JavaScripts based on structure and attribute analysis of abstract syntax trees (ASTs) that capture the generalized semantic meaning of the source code. We apply a graph convolutional neural network (GCN) to process the AST features and get a graph representation via neural message passing with neighborhood aggregation. The attention layer enriches our method to track pertinent parts of scripts that may contain the signature of malicious intent. We comprehensively evaluate the performance of our proposed approach on a real-world dataset to detect malicious websites. The proposed method demonstrates promising performance in terms of detection accuracy and robustness against obfuscated samples.
JavaScript-based attacks injected into a webpage to perpetrate malicious activities are still the main problem in web security. Recent works have leveraged advances in artificial intelligence by considering many feature representations to improve the performance of malicious webpage detection. However, they did not focus on extracting the intention of JavaScript content, which is crucial for detecting the maliciousness of a webpage. In this study, we introduce an additional feature extraction process that can capture the intention of the JavaScript content of the webpage. In particular, we developed a framework for obtaining a JavaScript representation based on the abstract syntax tree for JavaScript (AST-JS), which enriches the webpage features for a better detection model. Moreover, we investigated the influence of our proposed feature on improving the model’s performance by using the Shapley additive explanation method to define the significance of each feature category compared to our proposed feature. The evaluation shows that adding the AST-JS feature can improve the performance for detecting malicious webpage compared to previous work. We also found that AST significantly influences performance, especially for webpages with JavaScript content.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.