2008
DOI: 10.14778/1453856.1453920
|View full text |Cite
|
Sign up to set email alerts
|

Fault-tolerant stream processing using a distributed, replicated file system

Abstract: We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal stream processing and leaves more resources available for normal stream processing than previous proposals. Like several previous schemes, SGuard is based on rollback recovery [18]: it checkpoints the state of stream processing nodes periodically and restarts failed nodes from their most recent checkpoints. In contrast to previous prop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
48
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 61 publications
(48 citation statements)
references
References 35 publications
0
48
0
Order By: Relevance
“…We assume that individual operator partitions are deterministic, i.e., an operator partition produces an identical output when it processes the same input tuples in the same order. This is a common assumption [20,44,36,2,35,23] and most relational operators are deterministic. In a distributed system, however, the order in which input tuples reach an operator partition may not be deterministic.…”
Section: Model and Assumptionsmentioning
confidence: 99%
See 2 more Smart Citations
“…We assume that individual operator partitions are deterministic, i.e., an operator partition produces an identical output when it processes the same input tuples in the same order. This is a common assumption [20,44,36,2,35,23] and most relational operators are deterministic. In a distributed system, however, the order in which input tuples reach an operator partition may not be deterministic.…”
Section: Model and Assumptionsmentioning
confidence: 99%
“…As an optimization, operators can checkpoint only delta-changes of their state [11]. Other optimizations are also possible [11,19,23] and can be used with our framework.…”
Section: Concrete Framework Instancementioning
confidence: 99%
See 1 more Smart Citation
“…For this, we are implementing concurrent copy-onwrite data structures [14]. Further, due to other parallel, replicated data flows, any side-effect of operator migration is very likely to be hidden from the end-clients.…”
Section: Replication-aware Adaptationmentioning
confidence: 99%
“…These techniques either execute all the operator replicas [4,10,20] or consistently copy the state of a subset of replicas onto other replicas [10,11,14]. In contrast to these solutions, our iFlow conducts detouring as soon as it notices a transmission problem.…”
Section: Related Workmentioning
confidence: 99%