Abstract-Detecting various anomalies or unusual incidents in computer network traffic is one of the great challenges for both researchers and network administrators. If they had an efficient method that could detect network traffic anomalies quickly and accurately, they would be able to prevent security problems or network congestion caused by such anomalies. Therefore, we conducted a series of experiments to examine which and how interval-based network traffic features affect anomaly detection by using three famous machine learning algorithms: the naï ve Bayes, k-nearest neighbor, and support vector machine. Our findings would help researchers and network administrators to select effective interval-based features for each particular type of anomaly, and to choose a proper machine learning algorithm for their own network system.
Index Terms-Network traffic, anomaly detection, naï veBayes, nearest neighbor, support vector machine.
I. INTRODUCTIONOne of the crucial responsibilities of administrators is discovering various anomalies and unusual incidents in computer network system. Forms or causes of anomalies can vary considerably, which produce a variety of network problems such as network congestion or even security problems. Examples of network anomalies and unusual incidents are denial of service attacks (DoS), viruses or worms spreading, outages, misconfigurations, and flash crowds. If network administrators had an automatic mechanism that expeditiously detected unknown anomalies or unusual incidents, they would avoid serious consequences caused by such anomalies. Thus, an automatic mechanism detecting unknown anomalies in computer network traffic would be attractive.According to several studies [1]-[3], we can categorize detection methods into two major groups: signature-based methods and statistical-based methods.The signature-based methods, such as Snort [4], Suricata, or Bro [5], monitor and compare packets with predetermined attack patterns known as signatures. It is a simple and efficient method to examine network traffic. Although the false positive rate of this technique can also be low, comparing network packets or flows with a large set of signatures is a time consuming task and has limited predictive capabilities. In addition, the signature-based methods cannot detect novel anomalies that are not defined Manuscript received June 15, 2013; revised December 25, 2013. This work was supported by the Faculty Members Development Scholarship Program of Bangkok University, Thailand.Kriangkrai Limthong is with the Department of Informatics, Graduate University of Advanced Studies (Sokendai), Japan (e-mail: kriangkrai.l@bu.ac.th).in signatures. It means that administrators have to update the system signatures frequently.The statistical-based methods, however, can learn behavior of network traffic and possibly detect novel anomalies and unusual incidents. Many researchers have studied on particular techniques, such as the statistical profiling using histograms [6], parametric statistical modeling [7], non-p...