Stream frequency measurements are fundamental in many data stream applications such as financial data trackers, intrusion-detection systems, and network monitoring. Typically, recent data items are more relevant than old ones, a notion we can capture through a sliding window abstraction. This paper considers a generalized sliding window model that supports stream frequency queries over an interval given at query time. This enables drill-down queries, in which we can examine the behavior of the system in finer and finer granularities. For this model, we asymptotically improve the space bounds of existing work, reduce the update and query time to a constant, and provide deterministic solutions. When evaluated over real Internet packet traces, our fastest algorithm processes items 90-250 times faster, serves queries at least 730 times quicker and consumes at least 40% less space than the best known method.
IntroductionHigh-performance stream processing is essential for many applications such as financial data trackers, intrusiondetection systems, network monitoring, and sensor networks. Such applications require algorithms that are both time and space efficient to cope with high-speed data streams. Space efficiency is needed, due to the memory hierarchy structure, to enable cache residency and to avoid page swapping. This residency is vital for obtaining good performance, even when the theoretical computational cost is small (e.g., constant time algorithms may be inefficient if they access the DRAM for each element). To that end, stream processing algorithms often build compact approximate sketches (synopses) of the input streams.Recent items are often more relevant than old ones, which requires an aging mechanism for the sketches. Many applications realize this by tracking the stream's items over a sliding window. That is, the sliding window model [18] considers only a window of the most recent items in the stream, while older ones do not affect the quantity we wish to estimate. Indeed, the problem of maintaining different types of sliding window statistics was extensively studied [4,8,18,33,27].Yet, sometimes the window of interest may not be known a priori or they may be multiple interesting windows [17]. Further, the ability to perform drill-down queries, in which we examine the behavior of the system in finer and finer granularity may also be beneficial, especially for security applications. For example, this enables detecting when precisely a particular anomaly has started and who was involved in it [20]. Additional applications for this capability include identifying the sources of flash crowd effects and pinpointing the cause-effect relation surrounding a surge in demand on an e-commerce website [26].In this work, we study a model that allows the user to specify an interval of interest at query time. This extends traditional sliding windows that only consider fixed sized windows. As depicted in Figure 1, a sub-interval of a maximal window is passed as a parameter for each query, and the goal of the algorithm...