Virtualized Cloud platforms have become increasingly common and the number of online services hosted on these platforms is also increasing rapidly. A key problem faced by providers in managing these services is detecting the performance anomalies and adjusting resources accordingly. As online services generate a very large amount of monitored data in the form of time series, it becomes very difficult to process this complex data by traditional approaches. In this work, we present a novel distributed parallel approach for performance anomaly detection. We build upon Holt-Winters forecasting for automatic aberrant behavior detection in time series. First, we extend the technique to work with MapReduce paradigm. Next, we correlate the anomalous metrics with the target Service Level Objective (SLO) in order to locate the suspicious metrics. We implemented and evaluated our approach on a production Cloud encompassing IaaS and PaaS service models. Experimental results confirm that our approach is efficient and effective in capturing the metrics causing performance anomalies in large time series datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.