Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles 2011
DOI: 10.1145/2043556.2043583
|View full text |Cite
|
Sign up to set email alerts
|

Detecting failures in distributed systems with the Falcon spy network

Abstract: A common way for a distributed system to tolerate crashes is to explicitly detect them and then recover from them. Interestingly, detection can take much longer than recovery, as a result of many advances in recovery techniques, making failure detection the dominant factor in these systems' unavailability when a crash occurs.This paper presents the design, implementation, and evaluation of Falcon, a failure detector with several features. First, Falcon's common-case detection time is sub-second, which keeps un… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
47
0
1

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 69 publications
(48 citation statements)
references
References 38 publications
(42 reference statements)
0
47
0
1
Order By: Relevance
“…If a client fails during a write the system might end up in a vulnerable state where only a limited number of further failures can be tolerated. To limit this vulnerable time inconsistencies must be detected as fast as possible, by other client accesses or a failure detector [3,16]. Clients send their read or write requests directly to the data servers; the access granularity is one block.…”
Section: Distributed System Modelmentioning
confidence: 99%
“…If a client fails during a write the system might end up in a vulnerable state where only a limited number of further failures can be tolerated. To limit this vulnerable time inconsistencies must be detected as fast as possible, by other client accesses or a failure detector [3,16]. Clients send their read or write requests directly to the data servers; the access granularity is one block.…”
Section: Distributed System Modelmentioning
confidence: 99%
“…The information stored in network was used for detection of loss and errors in the network. Leners et al [17] presented a FALCON framework for network monitoring in which the error or malicious activities were tracked for the network in a distributed system. However, the end points and their data packets were not the prime focus.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Several works have presented modifications and additions to ZooKeeper (e.g., [21,23,25,36,40,55]), but (almost) none of them deals with changing the service's programming model. A notable exception is a recent short paper by Kalantari et al [36] which identifies inefficiencies related to ZooKeeper's watch mechanism.…”
Section: Related Workmentioning
confidence: 99%