The late detection of security threats causes a significant increase in the risk of irreparable damages and restricts any defense attempt. In this paper, we propose a sCAlable TRAffic Classifier and Analyzer (CATRACA). CATRACA works as an efficient online Intrusion Detection and Prevention System implemented as a Virtualized Network Function. CATRACA is based on Apache Spark, a Big Data Streaming processing system, and it is deployed over the Open Platform for Network Functions Virtualization (OPNFV), providing an accurate real-time threat-detection service. The system presents a friendly graphical interface that provides real-time visualization of the traffic and the attacks that occur in the network. Our prototype can differentiate normal traffic from denial of service (DoS) attacks and vulnerability probes over 95% accuracy under three different datasets. Moreover, CATRACA handles streaming data under concept drift detection with more than 85% of accuracy.
KEYWORDSbig data, network traffic classification, stream processing, threat detection, virtual network function
INTRODUCTIONThe Internet is facing constant changes, from the diversity of the user, the complexity of its application, until the heterogeneity of the information producers. 1 As a consequence, traffic monitoring, a critical task in maintaining the stability, reliability, and security of computer networks, is facing new challenges. 2 Current network monitoring tools are inadequate for current speed and management needs of large network domains.To ensure network security, new systems must be designed since current security systems such as Security Information and Event Management (SIEM) are inadequate. While 82% of security threats occur in minutes, an intrusion can take up to 8 months to be detected. 3 It is essential that the detection time is the least possible so that intrusion prevention can be effective. 4Security incidents have increased their complexity, and simple analysis and filtering of packets are no longer sufficient. Attackers try to hide malicious traffic from the security tools by forging the source IP and dynamically changing TCP port. In this context, a promising alternative for classifying network traffic and detect threats is to apply Machine Learning (ML) techniques. These techniques are suitable for big data, with more samples to train the classifier, as methods have higher effectiveness. 5 With a large number of features, however, ML techniques perform results with high latency due to computational resource consumption. This high latency is a disadvantage for applications that use machine learning for real-time classification. For example, network monitoring applications must analyze data and detect threats as quickly as possible. In this context, real-time stream processing allows the immediate analysis of different types of data and consequently benefits traffic monitoring for security threat detection. Open source distributed processing platforms such as Apache Storm, 6 Apache Flink, 7 and Apache Spark 8 process big data w...