MALURLS: A Lightweight Malicious Website Classification Based on URL Features

Aldwairi, Monther; Al-Salman, Rami

doi:10.4304/jetwi.4.2.128-133

Cited by 28 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To further emphasize, there are studies which display more accurate and logical representations of their JavaScript feature sets. Studies [33][34][35][36] are of varying times and datasets which plot distributions and t-SNE plots of their JavaScript features, which count deobfuscated code length, number of events and various vector embedding representations. These features are quite similar to the ones considered in this study, which allows us to benchmark the extent of the bias.…”

Section: Feature Inspection and Experimental Resultsmentioning

confidence: 99%

Investigating the Influence of Feature Sources for Malicious Website Detection

Chaiban

Sovilj²,

Soliman³

et al. 2022

Applied Sciences

View full text Add to dashboard Cite

Malicious websites in general, and phishing websites in particular, attempt to mimic legitimate websites in order to trick users into trusting them. These websites, often a primary method for credential collection, pose a severe threat to large enterprises. Credential collection enables malicious actors to infiltrate enterprise systems without triggering the usual alarms. Therefore, there is a vital need to gain deep insights into the statistical features of these websites that enable Machine Learning (ML) models to classify them from their benign counterparts. Our objective in this paper is to provide this necessary investigation, more specifically, our contribution is to observe and evaluate combinations of feature sources that have not been studied in the existing literature—primarily involving embeddings extracted with Transformer-type neural networks. The second contribution is a new dataset for this problem, GAWAIN, constructed in a way that offers other researchers not only access to data, but our whole data acquisition and processing pipeline. The experiments on our new GAWAIN dataset show that the classification problem is much harder than reported in other studies—we are able to obtain around 84% in terms of test accuracy. For individual feature contributions, the most relevant ones are coming from URL embeddings, indicating that this additional step in the processing pipeline is needed in order to improve predictions. A surprising outcome of the investigation is lack of content-related features (HTML, JavaScript) from the top-10 list. When comparing the prediction outcomes between models trained on commonly used features in the literature versus embedding-related features, the gain with embeddings is slightly above 1% in terms of test accuracy. However, we argue that even this somewhat small increase can play a significant role in detecting malicious websites, and thus these types of feature categories are worth investigating further.

show abstract

Section: Feature Inspection and Experimental Resultsmentioning

confidence: 99%

Investigating the Influence of Feature Sources for Malicious Website Detection

Chaiban

Sovilj²,

Soliman³

et al. 2022

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…For this work ( § IV) we surveyed several classification methods, highlighting those that could potentially be used for the Booters case, named: Euclidean distance [19], [24], Squared Euclidean distance [22], Manhattan distance [20], Fractional distance [19], [23], Cosine distance [20], K-Nearest Neighbors [21], [25], [26], and Naive Bayes [14], [15], [25], [13]. From all the studied methods Support Vector [27], [28], Hamming distance [29] and Genetic Algorithm [30] were not tested in our classification investigation. However, we consider these three methods as a future work opportunity to improve our classification accuracy.…”

Section: B Towards the Best Booter Classification Methodsmentioning

confidence: 99%

Booter blacklist: Unveiling DDoS-for-hire websites

Santanna

Schmidt

Tuncer

et al. 2016

2016 12th International Conference on Network and Service Management (CNSM)

View full text Add to dashboard Cite

The expansion of Distributed Denial of Service (DDoS) for hire websites, known as Booters, has radically modified both the scope and stakes of DDoS attacks. Until recently, however, Booters have only received little attention from the research community. Given their impact, addressing the challenges associated with this phenomenon is crucial. In this paper, we present a rigorous methodology to identify a comprehensive set of existing Booters in the Internet. The methodology relies on well-defined mechanisms to generate a Booter blacklist, from crawling suspect URLs to characterizing and classifying the collected URLs. The list obtained using the methodology presented in this paper has a classification accuracy of 95.5%, which is 10.5% better compared to previous work. We also demonstrate the usage of our methodology applied by the Dutch NREN, SURFNet, which started using our blacklist to extend their Booters' activities monitoring.

show abstract

“…This type of attack redirects the user to a malicious website that has been altered by hackers in one or more aspects, such as its visual appearance or some of the site's contents. Hacktivists strive to take down a website for several reasons [13]. This form of action occurs when the attackers discover the vulnerabilities of the website and utilize those vulnerabilities to compromise the website and modify the content on the web page without the owner's authorization, which is technically known as penetrating a website [11].…”

Section: ) Defacement Url Attacksmentioning

confidence: 99%

Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions

et al. 2022

View full text Add to dashboard Cite

In recent years, the digital world has advanced significantly, particularly on the Internet, which is critical given that many of our activities are now conducted online. As a result of attackers' inventive techniques, the risk of a cyberattack is rising rapidly. One of the most critical attacks is the malicious URL intended to extract unsolicited information by mainly tricking inexperienced end users, resulting in compromising the user's system and causing losses of billions of dollars each year. As a result, securing websites is becoming more critical. In this paper, we provide an extensive literature review highlighting the main techniques used to detect malicious URLs that are based on machine learning models, taking into consideration the limitations in the literature, detection technologies, feature types, and the datasets used. Moreover, due to the lack of studies related to malicious Arabic website detection, we highlight the directions of studies in this context. Finally, as a result of the analysis that we conducted on the selected studies, we present challenges that might degrade the quality of malicious URL detectors, along with possible solutions.

show abstract

MALURLS: A Lightweight Malicious Website Classification Based on URL Features

Abstract:
Surfing the World Wide Web (WWW) is becoming a dangerous everyday task with the Web becoming rich in all sorts of attacks. Websites are a major source of many scams, phishing attacks, identity theft, SPAM commerce and malwares. Howe… Show more

Cited by 28 publications

References 11 publications

Investigating the Influence of Feature Sources for Malicious Website Detection

Investigating the Influence of Feature Sources for Malicious Website Detection

Booter blacklist: Unveiling DDoS-for-hire websites

Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions

Contact Info

Product

Resources

About

MALURLS: A Lightweight Malicious Website Classification Based on URL Features

Abstract: Surfing the World Wide Web (WWW) is becoming a dangerous everyday task with the Web becoming rich in all sorts of attacks. Websites are a major source of many scams, phishing attacks, identity theft, SPAM commerce and malwares. Howe… Show more

Cited by 28 publications

References 11 publications

Investigating the Influence of Feature Sources for Malicious Website Detection

Investigating the Influence of Feature Sources for Malicious Website Detection

Booter blacklist: Unveiling DDoS-for-hire websites

Detecting Malicious URLs Using Machine Learning Techniques: Review and Research Directions

Contact Info

Product

Resources

About

Abstract:
Surfing the World Wide Web (WWW) is becoming a dangerous everyday task with the Web becoming rich in all sorts of attacks. Websites are a major source of many scams, phishing attacks, identity theft, SPAM commerce and malwares. Howe… Show more