Summary
Network anomalies and attacks represent a serious challenge to ISPs, who need to cope with an increasing number of unknown events that put their networks' integrity at risk. Most of the network anomaly detection systems proposed so far employ a supervised strategy to accomplish their task, using either signature‐based detection methods or supervised‐learning techniques. The former fails to detect unknown anomalies, exposing the network to severe consequences; the latter requires labeled traffic, which is difficult and expensive to produce. In this paper, we introduce a powerful unsupervised approach to detect and characterize network anomalies in the dark, that is, without relying on signatures or labeled traffic. Unsupervised detection is accomplished by means of robust clustering techniques, combining subspace clustering with correlation analysis to blindly identify anomalies. To alleviate network operator's post‐processing tasks and to speed up the deployment of effective countermeasures, anomaly ranking and characterization are automatically performed on the detected events. The system is extensively tested with real traffic from the Widely Integrated Distributed Environment backbone network, spanning 6years of flows captured from a trans‐Pacific link between Japan and the USA, using the MAWILab framework for ground‐truth generation. We additionally evaluate the proposed approach with synthetic data, consisting of traffic from an operational network with synthetic attacks. Finally, we compare the performance of the unsupervised detection against different previously used unsupervised detection techniques, as well as against multiple anomaly detectors used in MAWILab. Copyright © 2015 John Wiley & Sons, Ltd.