Latency-aware elastic scaling for distributed data stream processing systems

Heinze, Thomas; Jerzak, Zbigniew; Hackenbroich, Gregor; Fetzer, Christof

doi:10.1145/2611286.2611294

Cited by 89 publications

(68 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The systems uses as reward a weighted average of the difference between the current value and respective target system utilization. An extension of this approach is presented in Heinze et al (2014a ), where the authors try to minimize the number of latency violation maximizing the utilization values. In this case, there are decisions that are labeled as optional, and that can be cancelled or postponed in case the estimated latency spike is too high.…”

Section: Article In Pressmentioning

confidence: 99%

Self-adaptive processing graph with operator fission for elastic stream processing

Hidalgo

Wladdimiro

Rosas

2017

Journal of Systems and Software

View full text Add to dashboard Cite

Section: Article In Pressmentioning

confidence: 99%

Self-adaptive processing graph with operator fission for elastic stream processing

Hidalgo

Wladdimiro

Rosas

2017

Journal of Systems and Software

View full text Add to dashboard Cite

“…The configuration of such parameters is known to be complex and scenario-specific [14]. Therefore, we manually trained these parameters based on previous experiments [16,17] and the following assumptions:…”

Section: Parameter Configurationmentioning

confidence: 99%

“…In this section we present an overview of the data stream processing system [16,17] used as a foundation of our prototype. The architecture of our prototype is outlined in Figure 1.…”

Section: Introductionmentioning

confidence: 99%

An adaptive replication scheme for elastic data stream processing systems

Heinze¹,

Zia

Krahn

et al. 2015

Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems

Self Cite

View full text Add to dashboard Cite

A major challenge for cloud-based systems is to be fault tolerant to cope with an increasing probability of faults in cloud environments. This is especially true for in-memory computing solutions like data stream processing systems, where a single host failure might result in an unrecoverable information loss.In state of the art data streaming systems either active replication or upstream backup are applied to ensure fault tolerance, which have a high resource overhead or a high recovery time respectively. This paper combines these two fault tolerance mechanisms in one system to minimize the number of violations of a user-defined recovery time threshold and to reduce the overall resource consumption compared to active replication. The system switches for individual operators between both replication techniques dynamically based on the current workload characteristics. Our approach is implemented as an extension of an elastic data stream processing engine, which is able to reduce the number of used hosts due to the smaller replication overhead. Based on a real-world evaluation we show that our system is able to reduce the resource usage by up to 19% compared to an active replication scheme.

show abstract

“…In [12] they propose a greedy heuristic to find an optimal operator placement in polynomial time and in [13] they propose to find an operator placement that is "resilient" to change, meaning that it does not have to be changed upon load changes. Heinze et al model the problem of operator placement in Borealis as a bin-packing problem and use a firstfit heuristic to assign operators to machines (bins) [14].…”

Section: A Workload Scheduling In Distributed Systemsmentioning

confidence: 99%

Workload scheduling in distributed stream processors using graph partitioning

Fischer

Bernstein

2015

2015 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

With ever increasing data volumes, large compute clusters that process data in a distributed manner have become prevalent in industry. For distributed stream processing platforms (such as Storm) the question of how to distribute workload to available machines, has important implications for the overall performance of the system. We present a workload scheduling strategy that is based on a graph partitioning algorithm. The scheduler is application agnostic: it collects the communication behavior of running applications and creates the schedules by partitioning the result-ing communication graph using the METIS graph partitioning software. As we build upon graph partitioning algorithms that have been shown to scale to very large graphs, our approach can cope with topologies with millions of tasks. While the experiments in this paper assume static data loads, our approach could also be used in a dynamic setting. We implemented our proposed algorithm for the Storm stream processing system and evaluated it on a commodity cluster with up to 80 machines. The evaluation was conducted on four different use cases -three using synthetic data loads and one application that processes real data. We compared our algorithm against two state-of-the-art sched-uler implementations and show that our approach offers significant improvements in terms of resource utilization, enabling higher throughput at reduced network loads. We show that these improvements can be achieved while maintaining a balanced workload in terms of CPU usage and bandwidth consumption across the cluster. We also found that the performance advantage increases with message size, providing an important insight for stream-processing approaches based on micro-batching. Abstract-With ever increasing data volumes, large compute clusters that process data in a distributed manner have become prevalent in industry. For distributed stream processing platforms (such as Storm) the question of how to distribute workload to available machines, has important implications for the overall performance of the system. We present a workload scheduling strategy that is based on a graph partitioning algorithm. The scheduler is application agnostic: it collects the communication behavior of running applications and creates the schedules by partitioning the resulting communication graph using the METIS graph partitioning software. As we build upon graph partitioning algorithms that have been shown to scale to very large graphs, our approach can cope with topologies with millions of tasks. While the experiments in this paper assume static data loads, our approach could also be used in a dynamic setting. We implemented our proposed algorithm for the Storm stream processing system and evaluated it on a commodity cluster with up to 80 machines. The evaluation was conducted on four different use cases -three using synthetic data loads and one application that processes real data. We compared our algorithm against two state-of-the-art scheduler implementations and show that our app...

show abstract

Latency-aware elastic scaling for distributed data stream processing systems

Cited by 89 publications

References 15 publications

Self-adaptive processing graph with operator fission for elastic stream processing

Self-adaptive processing graph with operator fission for elastic stream processing

An adaptive replication scheme for elastic data stream processing systems

Workload scheduling in distributed stream processors using graph partitioning

Contact Info

Product

Resources

About