Abstract. Entropy-based anomaly detection has recently been extensively studied in order to overcome weaknesses of traditional volume and rule based approaches to network flows analysis. From many entropy measures only Shannon, Titchener and parameterized Renyi and Tsallis entropies have been applied to network anomaly detection. In the paper, our method based on parameterized entropy and supervised learning is presented. With this method we are able to detect a broad spectrum of anomalies with low false positive rate. In addition, we provide information revealing the anomaly type. The experimental results suggest that our method performs better than Shannon-based and volume-based approach.Keywords: anomaly detection, entropy, netflow, network traffic measurement
IntroductionThe number of anomalies in IP networks caused by wormlike activities is growing [2]. Widely used security solutions based on signatures or rules like firewalls, antiviruses and intrusion detection systems do not provide sufficient protection because they do not cope with evasion techniques and not known yet (0-day) attacks [12], [13]. Therefore, network anomaly detection as one of possible solutions is becoming an essential area of research. Anomaly detection is an identification of observations which do not conform to an expected behavior. In a supervised anomaly detection a labeled data set that involves training a classifier is required.There are many problems with anomaly detectors which have to be addressed. The main challenge is setting up a precise boundary between normal and anomalous behavior to avoid high false positive error rate or low detection rate. Another problems are long computation time, anomaly details extraction and root-cause identification [7]. In our previous work [4], some generalizations of entropy were described in details and preliminary results of using parameterized entropies were presented. In this paper, we make two major contributions. Firstly, we present our method and results in comparison with Shannon-based and volume-based approach. Secondly, we describe data set as well as the method we used to generate anomalies.