2022
DOI: 10.15439/2022f225
|View full text |Cite
|
Sign up to set email alerts
|

Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing

Abstract: Distributed Stream Processing systems are becoming an increasingly essential part of Big Data processing platforms as users grow ever more reliant on their ability to provide fast access to new results. As such, making timely decisions based on these results is dependent on a system's ability to tolerate failure. Typically, these systems achieve fault tolerance and the ability to recover automatically from partial failures by implementing checkpoint and rollback recovery. However, owing to the statistical prob… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…The Traffic Monitoring job is an IoT use case that calculates the average speed of moving vehicles in a particular radius adapted from the IoT Vehicles Experiment [9]. The job reads JSON vehicle events from a Kafka source, filters out events not contained within a radius of interest, calculates the average speed of vehicles in a ten second tumbling window, and enriches the vehicle information before outputting to a Kafka sink.…”
Section: Traffic Monitoringmentioning
confidence: 99%
See 1 more Smart Citation
“…The Traffic Monitoring job is an IoT use case that calculates the average speed of moving vehicles in a particular radius adapted from the IoT Vehicles Experiment [9]. The job reads JSON vehicle events from a Kafka source, filters out events not contained within a radius of interest, calculates the average speed of vehicles in a ten second tumbling window, and enriches the vehicle information before outputting to a Kafka sink.…”
Section: Traffic Monitoringmentioning
confidence: 99%
“…The Yahoo Streaming Benchmark workload is taken from realistic online advertising click-through rate data 3 . Lastly, the traffic monitoring workload was generated based on the TAPASCologne scenario and SUMO to simulate realistic traffic patterns in the city of Berlin [9]. Each job was benchmarked to determine the maximum throughput achievable with 12 workers.…”
Section: Workload Generationmentioning
confidence: 99%
“…The metrics of the host machine are eventually gathered via Prometheus queries and saved to file. As a source for publishing to the message queue, we employ the IoT Vehicles experiment dataset created and published in [28], which reports amounts of moving cars at intervals of 1 second. At each point in time, the corresponding vehicle amount is read and used for publishing the same amount of messages to the message queue.…”
Section: A Data Acquisitionmentioning
confidence: 99%