Phishing has become an increasing concern and captured the attention of end-users as well as security experts. Despite decades of development and improvement, existing phishing detection techniques still suffer from the deficiency in performance accuracy and the inability to detect unknown attacks. Motivated to solve these problems, many researchers in the cybersecurity domain have shifted their attention to phishing detection that capitalizes on machine learning techniques. In recent years, deep learning has emerged as a branch of machine learning that has become a promising solution for phishing detection. As a result, this study proposes a taxonomy of deep learning algorithms for phishing detection by examining 81 selected papers using a systematic literature review approach. The paper first introduces the concept of phishing and deep learning in the context of cybersecurity. Then, phishing detection and deep learning algorithm taxonomies are provided to classify the existing literature into various categories. Next, taking the proposed taxonomy as a baseline, this study comprehensively reviews the state-of-the-art deep learning techniques and analyzes their advantages as well as disadvantages. Subsequently, the paper discusses various issues deep learning faces in phishing detection and proposes future research directions to overcome these challenges. Finally, an empirical analysis is conducted to evaluate the performance of various deep learning techniques in a practical context and highlight the related issues that motivate researchers in their future works. The results obtained from the empirical experiment showed that the common issues among most of the state-of-the-art deep learning algorithms are manual parameter-tuning, long training time, and deficient detection accuracy.
Phishing detection with high-performance accuracy and low computational complexity has always been a topic of great interest. New technologies have been developed to improve the phishing detection rate and reduce computational constraints in recent years. However, one solution is insufficient to address all problems caused by attackers in cyberspace. Therefore, the primary objective of this paper is to analyze the performance of various deep learning algorithms in detecting phishing activities. This analysis will help organizations or individuals select and adopt the proper solution according to their technological needs and specific applications’ requirements to fight against phishing attacks. In this regard, an empirical study was conducted using four different deep learning algorithms, including deep neural network (DNN), convolutional neural network (CNN), Long Short-Term Memory (LSTM), and gated recurrent unit (GRU). To analyze the behaviors of these deep learning architectures, extensive experiments were carried out to examine the impact of parameter tuning on the performance accuracy of the deep learning models. In addition, various performance metrics were measured to evaluate the effectiveness and feasibility of DL models in detecting phishing activities. The results obtained from the experiments showed that no single DL algorithm achieved the best measures across all performance metrics. The empirical findings from this paper also manifest several issues and suggest future research directions related to deep learning in the phishing detection domain.
No abstract
The past decade has witnessed the rapid development of natural language processing and machine learning in the phishing detection domain. However, there needs to be more research on word embedding and deep learning for malicious URL classification. Inspired to solve this problem, this chapter aims to examine the application of word embedding and deep learning in extracting features from website URLs. To achieve this, several word embedding techniques, such as Keras, Word2Vec, GloVe, and FastText, were used to learn feature representations of webpage URLs. The obtained feature vectors were fed into a deep-learning model based on CNN-BiGRU for extraction and classification. Two different datasets were used to conduct numerous experiments, while various metrics were utilized to evaluate the phishing detection model's performance. The obtained findings indicated that when combined with deep learning, Keras outperformed other text embedding methods and achieved the best results across all evaluation metrics on both datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.