Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Geldenhuys, Morgan K.; Scheinert, Dominik; Kao, Odej; Thamsen, Lauritz

doi:10.1109/icws55610.2022.00041

Cited by 6 publications

(11 citation statements)

References 23 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The job was modified to enable it to consume events from and publish results to separate Apache Kafka topics. To simulate workload variations, represented by the changing number of vehicles over time, we used the Sumo simulation tool to generate a 24-hour workload dataset, specifically employing the TAPASCologne scenario 13 . Similar to the YSB experiment, we reduced this to 18 hours by sub-sampling every 4 th data point and then repeating the resulting workload three times.…”

Section: Top Speed Windowing (Tsw) Experimentmentioning

confidence: 99%

“…Petrov et al [27] detail a model that bases scaling actions on latency measurements, and DS2 [21] uses historical and realtime data for workload forecasting to dynamically scale streaming dataflows. In our previous work with Phoebe [13], initial profiling was conducted to establish models that map scaleout and workload rates to latency and recovery times. TSF was then employed to predict future workloads, allowing for dynamic rescaling of resources aimed at maintaining stable latencies and achieving optimal recovery times.…”

Section: Stream Processing Optimizationmentioning

confidence: 99%

“…This can lead to unnecessary changes that disrupt the service. Alternatively, proactive strategies [1, 2,13,17,21,27] often employ modeling techniques, using historical data to predict a near-optimal configuration setting for the workload. Nonetheless, these methods are not without their challenges, especially when historical data is limited, complicating accurate predictions in dynamic environments.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

Geldenhuys,

Scheinert,

Kao

et al. 2024

Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering

Self Cite

View full text Add to dashboard Cite

Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a good Quality of Service despite variable workloads. However, selecting scaleout configurations which maximize resource utilization remains a challenge. This is especially true in environments where workloads change over time and node failures are all but inevitable. Furthermore, configuration parameters such as memory allocation and checkpointing intervals impact performance and resource usage as well. Sub-optimal configurations easily lead to high operational costs, poor performance, or unacceptable loss of service.In this paper, we present Demeter, a method for dynamically optimizing key DSP system configuration parameters for resource efficiency. Demeter uses Time Series Forecasting to predict future workloads and Multi-Objective Bayesian Optimization to model runtime behaviors in relation to parameter settings and workload rates. Together, these techniques allow us to determine whether or not enough is known about the predicted workload rate to proactively initiate short-lived parallel profiling runs for data gathering. Once trained, the models guide the adjustment of multiple, potentially dependent system configuration parameters ensuring optimized performance and resource usage in response to changing workload rates. Our experiments on a commodity cluster using Apache Flink demonstrate that Demeter significantly improves the operational efficiency of long-running benchmark jobs.

show abstract

Section: Top Speed Windowing (Tsw) Experimentmentioning

confidence: 99%

Section: Stream Processing Optimizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

Geldenhuys,

Scheinert,

Kao

et al. 2024

Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…Relatively few DSP autoscaling approaches incorporate the overhead cost of scaling decisions [21]. Phoebe chooses scale-outs that can guarantee a target recovery time [10]. Martin et al [15] provide a self-adaptive approach for a DSP system to adjust its fault tolerance mechanism during runtime.…”

Section: Related Workmentioning

confidence: 99%

Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems

Pfister,

Scheinert,

Geldenhuys

et al. 2024

Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering

View full text Add to dashboard Cite

Distributed Stream Processing (DSP) systems are capable of processing large streams of unbounded data, offering high throughput and low latencies. To maintain a stable Quality of Service (QoS), these systems require a sufficient allocation of resources. At the same time, over-provisioning can result in wasted energy and high operating costs. Therefore, to maximize resource utilization, autoscaling methods have been proposed that aim to efficiently match the resource allocation with the incoming workload. However, determining when and by how much to scale remains a significant challenge. Given the long-running nature of DSP jobs, scaling actions need to be executed at runtime, and to maintain a good QoS, they should be both accurate and infrequent. To address the challenges of autoscaling, the concept of self-adaptive systems is particularly fitting. These systems monitor themselves and their environment, adapting to changes with minimal need for expert involvement.This paper introduces Daedalus, a self-adaptive manager for autoscaling in DSP systems, which draws on the principles of selfadaption to address the challenge of efficient autoscaling. Daedalus monitors a running DSP job and builds performance models, aiming to predict the maximum processing capacity at different scaleouts. When combined with time series forecasting to predict future workloads, Daedalus proactively scales DSP jobs, optimizing for maximum throughput and minimizing both latencies and resource usage. We conducted experiments using Apache Flink and Kafka Streams to evaluate the performance of Daedalus against two stateof-the-art approaches. Daedalus was able to achieve comparable latencies while reducing resource usage by up to 71%.

show abstract

“…throughput does not change over time. In the closely related area of research, we published an approach which uses times series forecasting to optimize the resource utilization of DSP jobs executing in environments where the workload is expected to change over time [20]. A group of approaches focuses on determining the mean time to failure (MTTF) of cluster nodes and then adaptively fitting a CI that minimizes the time lost due to failure [8]- [10].…”

Section: B Adaptive Checkpointingmentioning

confidence: 99%

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

Geldenhuys¹,

Pfister²,

Scheinert³

et al. 2022

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing checkpoint and rollback recovery. However, owing to the statistical probability of partial failures occurring in these distributed environments and the variability of workloads upon which jobs are expected to operate, static configurations will often not meet Quality of Service constraints with low overhead.In this paper we present Khaos, a new approach which utilizes the parallel processing capabilities of cloud orchestration technologies for the automatic runtime optimization of fault tolerance configurations in Distributed Stream Processing jobs. Our approach employs three subsequent phases which borrows from the principles of Chaos Engineering: establish the steadystate processing conditions, conduct experiments to better understand how the system performs under failure, and use this knowledge to continuously minimize Quality of Service violations. We implemented Khaos prototypically together with Apache Flink and demonstrate its usefulness experimentally.

show abstract

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Cited by 6 publications

References 23 publications

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

Daedalus: Self-Adaptive Horizontal Autoscaling for Resource Efficiency of Distributed Stream Processing Systems

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

Contact Info

Product

Resources

About