With the rapid development of large-scale complex networks and proliferation of various social network applications, the amount of network traffic data generated is increasing tremendously, and efficient anomaly detection on those massive network traffic data is crucial to many network applications, such as malware detection, load balancing, network intrusion detection. Although there are many methods around for network traffic anomaly detection, they are all designed for single machine, failing to deal with the case that the network traffic data are so large that it is prohibitive for a single computer to store and process the data. To solve these problems, we propose a parallel algorithm based on Isolation Forest and Spark for network traffic anomaly detection. We combine the advantages of Isolation Forest algorithm in network traffic anomaly detection and big data processing capability of Spark technology. Meanwhile, we apply the idea of parallelization to the process of modeling and evaluation. In the calculation process, by assigning tasks to multiple compute nodes, Isolation Forest and Spark can efficiently perform anomaly detection and evaluation process. By this way, we can also solve the problem of computation bottleneck on single machine. Extensive experiments on real world datasets show that our Isolation Forest and Spark is efficient and scales well for anomaly detection on large network traffic data.
Internet of Things (IoT) brought great convenience to people’s daily lives. Meanwhile, the IoT devices are facing severe attacks from hackers and malicious attackers. Hackers and malicious attackers use various methods to invade the Internet of Things system, causing the Internet of Things to face a large number of targeted, concealed, and penetrating potential threats, which makes the privacy problem of the Internet of Things suffers serious challenges. But the existing methods and technologies cannot fully identify the attacker’s attack process and protect the privacy of the Internet of Things. Alarm correlation method can construct a complete attack scenario and identify the attacker’s intention by alarming the alarm data which provides an effective protection for user privacy. However, the existing alarm correlation methods still have the disadvantages of low correlation accuracy, poor correlation efficiency, and strong dependence on the knowledge base. To address these issues, we propose an alarm correlation method based on Affinity Propagation (AP) clustering algorithm and causal relationship. Our method considers that the alarm data triggered by the same attack process has high similarity characteristics, adopts the AP algorithm to improve the correlation efficiency, and at the same time constructs a complete attack process based on the causal correlation idea. The new alarm correlation method has a high correlation effect and builds a complete attack process to help managers identify attack intentions and prevent attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.