2021
DOI: 10.1002/spe.3021
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing checkpoint‐based fault‐tolerance in distributed stream processing systems: Theory to practice

Abstract: Fault-tolerance is an essential part of a stream processing system that guarantees data analysis could continue even after failures. State-of-the-art distributed stream processing systems use checkpointing to support fault-tolerance for stateful computations where the state of the computations is periodically persisted. However, the frequency of performing checkpoints impacts the performance (utilization, latency, and throughput) of the system as the checkpointing process consumes resources and time that can b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 44 publications
0
1
0
Order By: Relevance
“…In a distributed system consisting of a set of n processes following the crash failure model, the processes exchange messages with each other through reliable FIFO communication channels [17]. Each process p takes its i-th local checkpoint (i ≥ 0), Ck i p , to record its current state in stable storage for recovery.…”
Section: Assumptionsmentioning
confidence: 99%
“…In a distributed system consisting of a set of n processes following the crash failure model, the processes exchange messages with each other through reliable FIFO communication channels [17]. Each process p takes its i-th local checkpoint (i ≥ 0), Ck i p , to record its current state in stable storage for recovery.…”
Section: Assumptionsmentioning
confidence: 99%