2011
DOI: 10.1145/1883612.1883616
|View full text |Cite
|
Sign up to set email alerts
|

The failure detector abstraction

Abstract: A failure detector is a fundamental abstraction in distributed computing. This paper surveys this abstraction through two dimensions. First we study failure detectors as building blocks to simplify the design of reliable distributed algorithms. In particular, we illustrate how failure detectors can factor out timing assumptions to detect failures in distributed agreement algorithms. Second, we study failure detectors as computability benchmarks. That is, we survey the weakest failure detector question and illu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
20
0
6

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 43 publications
(27 citation statements)
references
References 109 publications
1
20
0
6
Order By: Relevance
“…It is believed that the failure detector abstraction is a fundamental one and should sit as a first-class citizen of a distributed programming library. Additionally, failure detectors are important because of the possibility to classify problems in distributed computing [10].…”
Section: B Failure Detectorsmentioning
confidence: 99%
“…It is believed that the failure detector abstraction is a fundamental one and should sit as a first-class citizen of a distributed programming library. Additionally, failure detectors are important because of the possibility to classify problems in distributed computing [10].…”
Section: B Failure Detectorsmentioning
confidence: 99%
“…The literature on failure detectors is rich; see for example the recent surveys [22,34]. We focus on the eventual strong 3S failure detector that is known to be the weakest failure detector required to solve consensus [8,9] in message-passing systems, when the majority of the processes are non-crashed.…”
Section: Consensusmentioning
confidence: 99%
“…However, given the FLP impossibility [4], i.e., consensus can not be solved deterministically in asynchronous distributed systems in which even a single process can fail by crashing, deploying high-available distributed systems on the Internet is a challenge. In order to circumvent the impossibility of solving consensus in asynchronous distributed systems, Chandra and Toueg introduced failure detectors based on timeouts [5][6][7].…”
Section: Introductionmentioning
confidence: 99%