Mitigating Webshell Attacks through Machine Learning Techniques

Guo, Youguang; Marco-Gisbert, Héctor; Keir, Paul

doi:10.3390/fi12010012

Cited by 25 publications

(14 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Crawling session must not consist any offensive request while scanning session must consist at least one offensive request. Adjustments of number of requests and time gap is based on the gathered [23] .git scanner_env_file [24] .env scanner_nmap [25] nmaplowercheck scanner_voip_yealink [26] y000000000000.cfg, /prov scanner_voip_asterisk /servlet scanner_ncsi [27] ncsi.txt scanner_sntp [28] /html/sntp.html scanner_horde [29] /imp/test.php scanner_weblogic_oracle [30] bea_wls_deployment_internal scanner_pma [31] phpmyadmin, pma, phpma scanner_wp [32] wp-, xmlrpc, plugins, wordpress, /wp/ scanner_drupal drupal scanner_cgi [33] cgi-bin, cgi scanner_mysql mysql scanner_sqlite sqlite scanner_jboss [34] .jsp scanner_sql sql scanner_hnap [35] hnap1 scanner_webdav webdav scanner_login login, admin scanner_webshell [36] .php data. We are aware that one IP address can be shared between many clients (networks behind a NAT or many applications working parallel or in a chain) so one address is not always corresponding to one client.…”

Section: The Methodology Of Data Analysismentioning

confidence: 99%

SpiderTrap—An Innovative Approach to Analyze Activity of Internet Bots on a Website

2020

View full text Add to dashboard Cite

The main idea behind creating SpiderTrap was to build a website that can track how Internet bots crawl it. To track bots, honeypot dynamically generates different types of the hyperlinks on the web pages leading from one article to another and logs information passed by web clients in HTTP requests when visiting these links. By analyzing the sequences of visited links and passed HTTP requests it is possible to: detect bots, reveal bots' crawling or scanning algorithms, and other characteristic features of the traffic they generate. In our research we focused on identifying and describing whole bots' operations rather than just classifying single HTTP requests. This novel approach has given us insight into what different types of Internet bots are looking for and how they work. This information can be used to optimize the websites for search engines' bots for a better place on a search's results page or prepare a set of rules for tools that filter traffic to the web pages to minimize the impact of bad and unwanted bots on the websites' availability and security. We present the results of the five months of SpiderTrap's activity when honeypot was accessible by two domains (.pl and .eu), as well as by an IP address. The results show examples of activity of well-known Internet bots, such as Googlebot or Bingbot, unknown crawlers, and scanners trying to exploit vulnerabilities in the most popular web frameworks or looking for active webshells (i.e. access points to control a web server left by other attackers).

show abstract

Section: The Methodology Of Data Analysismentioning

confidence: 99%

SpiderTrap—An Innovative Approach to Analyze Activity of Internet Bots on a Website

2020

View full text Add to dashboard Cite

show abstract

“…At the same time, the feature dimension used is higher, and the internal correlation of similar features is also greater. erefore, compared with the Naive Bayes used by Guo et al [38], Random Forest is more suitable for the sample scenario in this paper. Simultaneously, the bi-gram only expresses the relationship between the adjacent opcodes, and the opcode to express a complete sentence of Python language needs five or more, so the n � 5 used in this paper can better represent the semantic information of the text.…”

Section: Comparative Experimentmentioning

confidence: 95%

“…Unlike this paper, TF-IDF represents the frequency of a single character. Guo et al [38] recognized webshell attacks through opcode and also used TF-IDF to represent text. However, the bi-gram was used to divide characters, and the final classifier chose Naive Bayes (the method of the paper below is represented by the author's last name).…”

Section: Comparative Experimentmentioning

confidence: 99%

“…ere are 121 types of Python opcodes in this experiment, so the vector dimension is set to 121. e table's data intuitively shows that PBDT is significantly better than the other three solutions. In-depth analysis, the algorithm used by Guo et al [38] is Naive Bayes. In Section 4.3.3, we have compared the algorithms, and the Random Forest performs better in the classification of Python backdoors.…”

Section: Comparative Experimentmentioning

confidence: 99%

See 1 more Smart Citation

PBDT: Python Backdoor Detection Model Based on Combined Features

Fang

Xie

Huang

2021

Security and Communication Networks

View full text Add to dashboard Cite

Application security is essential in today’s highly development period. Backdoor is a means by which attackers can invade the system to achieve illegal purposes and damage users’ rights. It has posed a serious threat to network security. Thus, it is urgent to take adequate measures to defend such attacks. Previous research work was mainly focused on numerous PHP webshells, with less research on Python backdoor files. Language differences make the method not entirely applicable. This paper proposes a Python backdoor detection model named PBDT based on combined features. The model summarizes the common functional modules and functions in the backdoor files and extracts the number of calls in the text to form sample features. What is more, we consider the text’s statistical characteristics, including the information entropy, the longest string, etc., to identify the obfuscated Python code. Besides, the opcode sequence is used to represent code characteristics, such as TF-IDF vector and FastText classifier, to eliminate the influence of interference items. Finally, we introduce the Random Forest algorithm to build a classifier. Covering most types of backdoors, some samples are obfuscated, the model achieves an accuracy of 97.70%, and the TNR index is as high as 98.66%, showing a good classification performance in Python backdoor detection.

show abstract

“…In addition, due to the constant evolution and iteration of code obfuscation and code encryption techniques, webshells can easily bypass regular methods, which are based on regular expressions. Moreover, the static feature detection method has no way to conduct interprocedural analysis, that is to detect the included files and user-defined dangerous function, so the detection method is based on the feature code and syntax analysis, and the dangerous function [16] name matching can be easily bypassed.…”

Section: Introductionmentioning

confidence: 99%

WTA: A Static Taint Analysis Framework for PHP Webshell

Zhao

Wang

et al. 2021

Applied Sciences

View full text Add to dashboard Cite

Webshells are a malicious scripts that can remotely control a webserver to execute arbitrary commands, steal sensitive files, and further invade the internal network. Existing webshell detection methods, such as using pattern matching for webshell detection, can be easily bypassed by attackers using the file include and user-defined functions. Furthermore, detecting unknown webshells has always been a problem in the field of webshell detection. In this paper, we propose a static webshell detection method based on taint analysis, which realizes accurate taint analysis based on ZendVM. We first converted the PHP code into Opline sequences, analyzed the Opline sequences in order, and marked the externally imported taint source. Then, the propagation of the taint variables was tracked, and the interprocedural analysis of the taint variables was performed. Finally, considering the dangerous functions’ call and the referencing of the taint variables at the point of the taint sink, we completed the webshell judgment. Based on this method, we constructed a taint analysis prototype system named WTA and evaluated it with a benchmark dataset by comparing its performance with popular webshell detection tools. The results showed that our method supports interprocedural analysis and has the ability to detect unknown webshells and that WTA’s performance surpasses well-known webshell detection tools such as D-shield, SHELLPUB, WebshellKiller, CloudWalker, ClamAV, LoKi, and findbot.pl.

show abstract

Mitigating Webshell Attacks through Machine Learning Techniques

Cited by 25 publications

References 11 publications

SpiderTrap—An Innovative Approach to Analyze Activity of Internet Bots on a Website

SpiderTrap—An Innovative Approach to Analyze Activity of Internet Bots on a Website

PBDT: Python Backdoor Detection Model Based on Combined Features

WTA: A Static Taint Analysis Framework for PHP Webshell

Contact Info

Product

Resources

About