2022 IEEE International Conference on Web Services (ICWS) 2022
DOI: 10.1109/icws55610.2022.00041
|View full text |Cite
|
Sign up to set email alerts
|

Phoebe: QoS-Aware Distributed Stream Processing through Anticipating Dynamic Workloads

Abstract: Distributed Stream Processing systems have become an essential part of big data processing platforms. They are characterized by the high-throughput processing of near to realtime event streams with the goal of delivering low-latency results and thus enabling time-sensitive decision making. At the same time, results are expected to be consistent even in the presence of partial failures where exactly-once processing guarantees are required for correctness. Stream processing workloads are oftentimes dynamic in na… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(11 citation statements)
references
References 23 publications
(27 reference statements)
0
11
0
Order By: Relevance
“…The job was modified to enable it to consume events from and publish results to separate Apache Kafka topics. To simulate workload variations, represented by the changing number of vehicles over time, we used the Sumo simulation tool to generate a 24-hour workload dataset, specifically employing the TAPASCologne scenario 13 . Similar to the YSB experiment, we reduced this to 18 hours by sub-sampling every 4 th data point and then repeating the resulting workload three times.…”
Section: Top Speed Windowing (Tsw) Experimentmentioning
confidence: 99%
See 2 more Smart Citations
“…The job was modified to enable it to consume events from and publish results to separate Apache Kafka topics. To simulate workload variations, represented by the changing number of vehicles over time, we used the Sumo simulation tool to generate a 24-hour workload dataset, specifically employing the TAPASCologne scenario 13 . Similar to the YSB experiment, we reduced this to 18 hours by sub-sampling every 4 th data point and then repeating the resulting workload three times.…”
Section: Top Speed Windowing (Tsw) Experimentmentioning
confidence: 99%
“…Petrov et al [27] detail a model that bases scaling actions on latency measurements, and DS2 [21] uses historical and realtime data for workload forecasting to dynamically scale streaming dataflows. In our previous work with Phoebe [13], initial profiling was conducted to establish models that map scaleout and workload rates to latency and recovery times. TSF was then employed to predict future workloads, allowing for dynamic rescaling of resources aimed at maintaining stable latencies and achieving optimal recovery times.…”
Section: Stream Processing Optimizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Relatively few DSP autoscaling approaches incorporate the overhead cost of scaling decisions [21]. Phoebe chooses scale-outs that can guarantee a target recovery time [10]. Martin et al [15] provide a self-adaptive approach for a DSP system to adjust its fault tolerance mechanism during runtime.…”
Section: Related Workmentioning
confidence: 99%
“…throughput does not change over time. In the closely related area of research, we published an approach which uses times series forecasting to optimize the resource utilization of DSP jobs executing in environments where the workload is expected to change over time [20]. A group of approaches focuses on determining the mean time to failure (MTTF) of cluster nodes and then adaptively fitting a CI that minimizes the time lost due to failure [8]- [10].…”
Section: B Adaptive Checkpointingmentioning
confidence: 99%