Apache Storm is a distributed processing engine that can reliably process unbounded streams of data for real-time applications. While recent research activities mostly focused on devising a resource allocation and task scheduling algorithm to satisfy high performance or low latency requirements of Storm applications across a distributed and multi-core system, finding a solution that can optimize the energy consumption of running applications remains an important research question to be further explored.In this article, we present a controlling strategy for CPU throttling that continuously optimize the level of consumed energy of a Storm platform by adjusting the voltage and frequency of the CPU cores while running the assigned tasks under latency constraints defined by the end-users. The experimental results running over a Storm cluster with 4 physical nodes (total 24 cores) validates the effectiveness of proposed solution when running multiple compute-intensive operations. In particular, the proposed controller can keep the latency of analytic tasks, in terms of 99th latency percentile, within the quality of service requirement specified by the end-user while reducing the total energy consumption by 18% on average across the entire Storm platform.
K E Y W O R D Sdata stream processing engines, energy-aware resource allocation algorithm, performance evaluation of computer systems
INTRODUCTIONThere is an unprecedented increase in the complexity, volume, and velocity * of data in both industrial and financial sectors in comparison to the previous decades. Such a shift immensely affects the conventional methodologies for collecting, ingesting and processing of data items which can be generated from a diverse set of sources (such as distributed sensors). Examples of such applicability can be found in different contexts from traffic controllers to fraud detection systems in financial and health sectors. The widespread growth of programming models and the evolution of streaming data processing technologies, † developed by giant organizations in the recent years indicates the available capital and high demand for technologies to help businesses with making real-time decisions.Such a trend calls for efficient data management technologies, high speed storage systems, and more cost-effective processing tools to not only cope with the explosion of enterprise data, but also to satisfy the low latency requirement for processing time-sensitive and mission-critical data in almost every modern data-driven applications. Streaming data processing technologies have recently emerged as valuable tools that are able to effectively provide such benefits when dealing with high volume of data generated rapidly at high rate from various sources. Apache Storm is among modern stream processing engines that provides set of tools (i.e., application programming interfaces) to facilitate developing and execution of data-driven applications in a distributed manner. 6Real-time task scheduling and computing resource allocation is a research field ...