Abstract:In the Iterated Immediate Snapshot model (IIS ) the memory consists of a sequence of one-shot Immediate Snapshot (IS ) objects. Each IS object can be accessed with an operation that atomically writes a value and returns a snapshot of its contents. Each process can access each IS object at most once. Processes access the sequence of IS objects, one-by-one, asynchronously, in a wait-free manner; any number of processes can crash. It has been shown by Borowsky and Gafni and others that this model is very useful to study the usual read/write shared memory model. Its interest lies in the elegant recursive structure of its runs, hence of the ease to analyze it round by round. In a very interesting way, Borowsky and Gafni have shown that the IIS model and the read/write model are equivalent for the wait-free solvability of decision tasks.In this paper we extend the benefits of the IIS model to partially synchronous systems. Given a shared memory model enriched with a failure detector, what is an equivalent IIS model? The paper shows that an elegant way of capturing the power of a failure detector and other partially synchronous systems in the IIS model is by restricting appropriately its set of runs, giving rise to the Iterated Restricted Immediate Snapshot model (IRIS ).The benefit of the proposed approach is new results (including new proofs of existing results) when we consider the IRIS model instead of the equivalent read/write model enriched with a given failure detector directly. As a study case, the paper considers a system enriched with limited-scope accuracy failure detectors, where there is a cluster of processes such that eventually some correct process is eventually never suspected by any process in that cluster. The paper provides a new proof of the k-set agreement Herlihy and Penso's lower bound for shared memory system augmented with a limited-scope accuracy failure detector. The proof is based on an extension of the Borowsky-Gafni IIS simulation to encompass failure detectors, followed by a very simple topological argumentation.With the IRIS model we have succeeded in capturing the partial synchrony of a failure detector enriched system via a fully asynchronous, round by round system. We thus hope to have contributed to a better understanding of fault-tolerant distributed computing.
Key-words:Algorithmic reduction, Asynchronous system, Distributed algorithm, Distributed Computability, Failure detectors, Fault-tolerance, Round-based computation, Shared memory, Topology.
IntroductionA distributed model of computation consists of a set of n processes communicating through some medium (some form of message passing or shared memory), satisfying specific timing assumptions (process speeds and communication delays), and failure assumptions (their number and severity). A major obstacle in the development of a theory of distributed computing is the wide variety of models that can be defined -many of which represent real systems -with combinations of parameters in both the (a)synchrony and failure dimen...