To tackle phishing attacks, recent research works have resorted to the application of machine learning (ML) algorithms, yielding promising results. Often, a binary classification model is trained on labeled datasets of benign and phishing URLs (and contents) obtained via crawling. While phishing classifiers have high accuracy (precision and recall), they, however, are also prone to adversarial attacks wherein an adversary tries to evade the ML-based classifier by mimicking (feature values of) benign web pages. Based on this observation, in our work, we propose a simple approach to build a robust phishing page detection system. Our detection system, based on voting, employs multiple models, such that each model is trained by inserting (controlled) noises in a subset of randomly selected features from the full feature set. We conduct comprehensive experiments using real datasets, and based on a number of evasive strategies, evaluate the robustness of, both, the traditional native ML model and our proposed detection system. The results demonstrate that our proposed system, on one hand, performs close to the native model when there is no adversarial attack, and on the other hand, is more robust against evasion attacks than the native model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.