SUMMARYIn this paper, we study the effects of anomalies on the distribution of TCP flow interarrival time process. We show empirically that despite the variety of data networks in size, number of users, applications, and load, the interarrival times of normal flows comply with the Weibull distribution, whereas specific irregularities (anomalies) causes deviations from the distribution. We first estimate the scale and shape parameters and then check the discrepancy of the data from a Weibull distribution with the estimated parameters. We also utilize the Weibull counting model to recheck the conformance of small flow interarrival times with the distribution. We perform our experiments on a diverse variety of traffic data sets from backbone connections to endpoints of academic and commercial networks. Moreover, we propose a window-based anomaly detection method as a possible application of our findings in which we first estimate the Weibull parameters of interarrival times in each window and then check the discrepancy of the data with a Weibull distribution with the estimated parameters and set an alarm whenever the difference is significant. We apply this method on one of our data sets and present the results to clarify the idea and show its capability in detecting volume anomalies.
IP packets are known to have long range dependence and show self-similar properties. However, TCP flows-a set of related IP packets that form a TCP connectionwhich are considered to be generated by a large population of users and consequently mutually independent, seem to be best modeled by either Poisson processes with exponential interarrival times or some distributions with heavy tails such as Weibull distribution. In this paper, we show that despite the number of active nodes in a network, the inter-arrival times of TCP flows in the "normal traffic" conform to the Weibull distribution and any irregularity in the traffic causes deviations in the distribution of the inter-arrival times and so can be detected. This leads to a straightforward method for anomaly detection by which we are able to identify the anomalous part(s) of the traffic. We first apply the medianrank method to estimate the Weibull distribution parameters of the traffic and then check the conformity of the data against a Weibull distribution with the estimated parameters and determine whether the traffic is normal or not based on the chi-square test.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.