Although a variety of techniques to detect malicious websites have been proposed, it becomes more and more difficult for those methods to provide a satisfying result nowadays. Many malicious websites can still escape detection with various Web spam techniques. In this paper, we first summarize three types of Web spam techniques used by malicious websites, such as redirection spam, hidden IFrame spam, and content hiding spam. We then present a new detection method that adopts the perspective of users and takes screenshots of malicious webpages to invalidate Web spams. The proposed detection method uses a Convolutional Neural Network, which is a class of deep neural networks, as a classification algorithm. In order to verify the effectiveness of the method, two different experiments have been conducted. First, the proposed method was tested based on a constructed complex dataset. We present comparison results between the proposed method and representative machine learning-based detection algorithms. Second, the proposed method was tested to detect malicious websites in a real-world Web environment for three months. These experimental results illustrate that the proposed method has a better performance and is applicable to a practical Web environment. INDEX TERMS Convolutional neural network, machine learning, malicious website detection.
Phishing attack, as a significant security concern in cyberspace, has continuously threatened organizations and Internet users. For organizations, the rise in the number of phishing target brands has instilled distrust and dissatisfaction in legitimate Internet users and even damaged brand equity. Therefore, more fine-grained phishing detection mechanisms are urgently needed. In this study, we propose PTI-NN, an effective model based on neural networks that uses category features and images to identify the target brands of phishing websites. We furthermore contribute a new dataset including 3,500 phishing websites and present thirty phishing category features, which facilitate pertinent phishing detection in the field of cyber security. In the proposed PTI-NN, an embedding-based DNN is constructed to process the category features, a 2D-CNN is constructed to process the images, and finally, a fully connected layer is used to predict the target brand of phishing websites. The experimental results show that our proposed model is able to classify seventy phishing-targeted brands with a high accuracy of 91.10%, which showcases the effectiveness of our method on the identification of phishing target brands.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.