2014
DOI: 10.1587/transinf.e97.d.65
|View full text |Cite
|
Sign up to set email alerts
|

A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems

Abstract: SUMMARYCheckpoint-rollback recovery, which is a universal method for restoring distributed systems after faults, requires a sophisticated snapshot algorithm especially if the systems are large-scale, since repeatedly taking global snapshots of the whole system requires unacceptable communication cost. As a sophisticated snapshot algorithm, a partial snapshot algorithm has been introduced that takes a snapshot of a subsystem consisting only of the nodes that are communication-related to the initiator instead of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(12 citation statements)
references
References 17 publications
0
12
0
Order By: Relevance
“…In this section, we evaluate the performance of the proposed algorithm with the CSS algorithm. 10,20 The CSS algorithm is a representative of partial snapshot algorithms, as described in Section 2, and the two algorithms have the same properties: (1) The algorithms do not suspend an application execution on a distributed system while taking a snapshot, (2) the algorithms take partial snapshots (not snapshots of the entire system), ( 3) the algorithms can take multiple snapshots concurrently, and (4) the algorithms can handle dynamic network topology changes. In addition, both algorithms are based on the SSS algorithm.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…In this section, we evaluate the performance of the proposed algorithm with the CSS algorithm. 10,20 The CSS algorithm is a representative of partial snapshot algorithms, as described in Section 2, and the two algorithms have the same properties: (1) The algorithms do not suspend an application execution on a distributed system while taking a snapshot, (2) the algorithms take partial snapshots (not snapshots of the entire system), ( 3) the algorithms can take multiple snapshots concurrently, and (4) the algorithms can handle dynamic network topology changes. In addition, both algorithms are based on the SSS algorithm.…”
Section: Discussionmentioning
confidence: 99%
“…In contrast, SSS algorithm allows execution of any applications while a snapshot is taken, with some elaborate operations based on the communication-relation. Kim et al, proposed a new partial snapshot algorithm, named Concurrent Sub-Snapshot (CSS) algorithm [11,21], based on SSS algorithm. They called the problematic situation caused by the overlap of the subsystems a collision and presented an algorithm that can resolve collisions by combining colliding SSS algorithm instances.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…For the practical implementation of the snapshot protocol, the system model must consider failures and asynchronous communication [27]. In [28], a partial snapshot algorithm for a subsystem, where multiple nodes concurrently initiate the snapshot algorithm, is proposed. In Snapify [29], a snapshot algorithm for offload applications on Xeon Phi manycore processors is proposed.…”
Section: Snapshot Protocolsmentioning
confidence: 99%
“…Algorithm 1 shows the pseudocode of the proposed distributed snapshot algorithm for the active thread. Before starting a round, node i checks whether a consistent global state is collected for failedRound (lines [16][17][18][19][20][21][22][23][24][25][26][27][28]. If the stateNodes data structure satisfies the conditions of the GS, node i saves the stateNodes data structure to latestSnapshot and builds the stateChannel data structure (lines 17-21).…”
Section: Details Of the Algorithmsmentioning
confidence: 99%