2009 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing 2009
DOI: 10.1109/pdp.2009.33
|View full text |Cite
|
Sign up to set email alerts
|

FAST Failure Detection Service for Large Scale Distributed Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 17 publications
0
6
0
Order By: Relevance
“…This requires an appropriate fault detection algorithm, which can determine the new graph topology and, importantly, the new number of nodes. Such algorithms are typically executed intermittently, giving rise to a certain detection latency, .…”
Section: Fault Tolerant Implementationmentioning
confidence: 99%
“…This requires an appropriate fault detection algorithm, which can determine the new graph topology and, importantly, the new number of nodes. Such algorithms are typically executed intermittently, giving rise to a certain detection latency, .…”
Section: Fault Tolerant Implementationmentioning
confidence: 99%
“…The decision which recovery point should be used is based on information gathered by SIM both from the MsgBuffer ← ∅ 38: end if service and from RMU. Details and rationale of this algorithm (l. [7][8][9][10][11][12][13] are beyond the scope of this paper and were provided in [6].…”
Section: Algorithm 1 Rollback-recovery Protocol -Data Typesmentioning
confidence: 99%
“…If an appropriate response is available, it is sent back to the client. If the request is already saved but there is no response, RMU orders CIM to repeat the request later, as the request is still executed by the service and the response is expected to arrive (l. [5][6][7][8][9][10][11]. If the received request is not yet processed, it is saved in RMU's stable storage, supplemented with the target service's identifier and epoch number, and directed to the service's SIM module (l. [12][13][14][15][16][17].…”
Section: Algorithm 1 Rollback-recovery Protocol -Data Typesmentioning
confidence: 99%
See 1 more Smart Citation
“…The systems described in [5] and [6] use a randomized monitoring topology for failure detection. On the other hand, the one described in [7] a uses structured skip ring topology. We applied the findings reported in these papers not only for failure detection but also for developing an IP address stealing topology.…”
Section: Related Studiesmentioning
confidence: 99%