Self-stabilizing End-to-End Communication in (Bounded Capacity, Omitting, Duplicating and non-FIFO) Dynamic Networks

Hanemann, Ariel; Schiller, Elad Michael; Sharma, Shantanu

doi:10.1007/978-3-642-33536-5_14

Cited by 30 publications

(54 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our approach for composing practically-self-stabilizing algorithms assumes that the messages of the client algorithm are piggybacked by the ones of the server, and that the server algorithm can send any message independently. Also, we assume that the communication among processors relies on a selfstabilizing end-to-end protocol, such as the ones in [11,13].…”

Section: The Abstract Task Of Dolev Et Al's Labeling Schemementioning

confidence: 99%

“…We assume that any p i , p j ∈ P have access to channel i,j , which is a self-stabilizing endto-end message delivery protocol (that is reliable FIFO) that transfers packets from p i to p j . Note that [11,13] present a self-stabilizing reliable FIFO message delivery protocol that tolerates packet omissions, reordering, and duplication over non-FIFO channels.The interleaving model. The processor's program is a sequence of (atomic) steps.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Practically-Self-stabilizing Vector Clocks in the Absence of Execution Fairness

Salem

Schiller

2019

Networked Systems

Self Cite

View full text Add to dashboard Cite

Vector clock algorithms are basic wait-free building blocks that facilitate causal ordering of events. As wait-free algorithms, they are guaranteed to complete their operations within a finite number of steps. Stabilizing algorithms allow the system to recover after the occurrence of transient faults, such as soft errors and arbitrary violations of the assumptions according to which the system was designed to behave. We present the first, to the best of our knowledge, stabilizing vector clock algorithm for asynchronous crash-prone message-passing systems that can recover in a wait-free manner after the occurrence of transient faults. In these settings, it is challenging to demonstrate a finite and wait-free recovery from (communication and crash failures as well as) transient faults, bound the message and storage sizes, deal with the removal of all stale information without blocking, and deal with counter overflow events (which occur at different network nodes concurrently).We present an algorithm that never violates safety in the absence of transient faults and provides bounded time recovery during fair executions that follow the last transient fault. The novelty is that in the absence of execution fairness, the algorithm guarantees a bound on the number of times in which the system might violate safety (while existing algorithms might block forever due to the presence of both transient faults and crash failures).Since vector clocks facilitate a number of elementary synchronization building blocks (without requiring remote replica synchronization) in asynchronous systems, we believe that our analytical insights are useful for the design of other systems that cannot guarantee execution fairness. Context and Motivation.Vector clocks allow reasoning about causality among events in distributed systems, for example, when constructing distributed snapshots [17]. Shapiro et al. [24] showed that vector clocks are building blocks of several conflict-free replicated data types (CRDTs). CRDTs are distributed data structures that can be shared among many replicas in asynchronous networks. All replica updates occur independently and achieve strong eventual consistency without using mechanisms for synchronization [25] or roll-back.The industrial use of CRDTs includes globally distributed databases, such as the ones of Redis, Riak, Bet365, SoundCloud, TomTom, Phoenix, and Facebook. Some of these databases have around ten million concurrent users, ten thousand messages per second, store large volumes of data, and offer very low latency. However, while both the literature and the users demonstrate that large-scale decentralized systems can benefit from the use of CRDTs in general and vector clocks in particular, the relationship between fault-tolerance and strong eventual consistency has not received sufficient attention. Providing higher robustness degrees to CRDTs is nevertheless imperative for ensuring the availability and safety of these systems.Providing robustness in the presence of unexpected failures, i.e., the ones that...

show abstract

Section: The Abstract Task Of Dolev Et Al's Labeling Schemementioning

confidence: 99%

mentioning

confidence: 99%

Practically-Self-stabilizing Vector Clocks in the Absence of Execution Fairness

Salem

Schiller

2019

Networked Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Let a arrival,k ∈ R be the first step that appears after a depart,k in R, if there is any such step, in which the node at p j delivers the packet (token) that a depart,k transmits the message m (if there are several such packets, consider the last to arrive). By the correctness of the end-to-end [15,17] Suppose that step a k does not appear in R, i.e., m appears in R's starting system state. By the definition of asynchronous rounds with round-trips (Remark 2.1), within O(1) asynchronous cycles, all messages in transit to p j arrive (or leave the communication channel).…”

Section: Correctnessmentioning

confidence: 99%

Self-Stabilizing Snapshot Objects for Asynchronous Failure-Prone Networked Systems

Georgiou

Lundström

Schiller

2019

Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing

View full text Add to dashboard Cite

A snapshot object simulates the behavior of an array of single-writer/multi-reader shared registers that can be read atomically. Delporte-Gallet et al. proposed two fault-tolerant algorithms for snapshot objects in asynchronous crash-prone messagepassing systems. Their first algorithm is non-blocking; it allows snapshot operations to terminate once all write operations had ceased. It uses O(n) messages of O(n · ν) bits, where n is the number of nodes and ν is the number of bits it takes to represent the object. Their second algorithm allows snapshot operations to always terminate independently of write operations. It incurs O(n 2 ) messages.The fault model of Delporte-Gallet et al. considers both node failures (crashes). We aim at the design of even more robust snapshot objects. We do so through the lenses of self-stabilization-a very strong notion of fault-tolerance. In addition to Delporte-Gallet et al.'s fault model, a self-stabilizing algorithm can recover after the occurrence of transient faults; these faults represent arbitrary violations of the assumptions according to which the system was designed to operate (as long as the code stays intact).In particular, in this work, we propose self-stabilizing variations of Delporte-Gallet et al.'s non-blocking algorithm and always-terminating algorithm. Our algorithms have similar communication costs to the ones by Delporte-Gallet et al. and O(1) recovery time (in terms of asynchronous cycles) from transient faults. The main differences are that our proposal considers repeated gossiping of O(ν) bits messages and deals with bounded space (which is a prerequisite for self-stabilization). Lastly, we explain how to extend the proposed solutions to reconfigurable ones.

show abstract

“…Aside from these results, there has been research on self-stabilizing link layers [8,9] which even guarantee FIFO-delivery, thus giving stronger guarantees than required for the relay layer.…”

Section: Related Workmentioning

confidence: 99%

Relays: A New Approach for the Finite Departure Problem in Overlay Networks

Scheideler

Setzer

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A fundamental problem for overlay networks is to safely exclude leaving nodes, i.e., the nodes requesting to leave the overlay network are excluded from it without affecting its connectivity. To rigorously study self-stabilizing solutions to this problem, the Finite Departure Problem (FDP) has been proposed [12]. In the FDP we are given a network of processes in an arbitrary state, and the goal is to eventually arrive at (and stay in) a state in which all leaving processes irrevocably decided to leave the system while for all weakly-connected components in the initial overlay network, all staying processes in that component will still form a weakly connected component. In the standard interconnection model, the FDP is known to be unsolvable by local control protocols, so oracles have been investigated that allow the problem to be solved [12]. To avoid the use of oracles, we introduce a new interconnection model based on relays. Despite the relay model appearing to be rather restrictive, we show that it is universal, i.e., it is possible to transform any weakly-connected topology into any other weaklyconnected topology, which is important for being a useful interconnection model for overlay networks. Apart from this, our model allows processes to grant and revoke access rights, which is why we believe it to be of interest beyond the scope of this paper. We show how to implement the relay layer in a self-stabilizing way and identify properties protocols need to satisfy so that the relay layer can recover while serving protocol requests.

show abstract

Self-stabilizing End-to-End Communication in (Bounded Capacity, Omitting, Duplicating and non-FIFO) Dynamic Networks

Cited by 30 publications

References 33 publications

Practically-Self-stabilizing Vector Clocks in the Absence of Execution Fairness

Practically-Self-stabilizing Vector Clocks in the Absence of Execution Fairness

Self-Stabilizing Snapshot Objects for Asynchronous Failure-Prone Networked Systems

Relays: A New Approach for the Finite Departure Problem in Overlay Networks

Contact Info

Product

Resources

About