Abstract. Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems. In order to be eventually able to automatically verify such fault-tolerant distributed algorithms also in larger systems, several problems have to be addressed. In this paper, we consider modeling and verification of fault-tolerant algorithms that basically only contain threshold guards to control the flow of the algorithm. As threshold guards are widely used in fault-tolerant distributed algorithms (and also in Paxos) efficient methods to handle them bring us closer to the above mentioned goal. As case study we use the reliable broadcasting algorithm by Srikanth and Toueg that tolerates even Byzantine faults. We show how one can model this basic fault-tolerant distributed algorithm in Promela such that safety and liveness properties can be efficiently verified in Spin. We provide experimental data also for other distributed algorithms.
We introduce an automated method for parameterized verification of fault-tolerant distributed algorithms. It rests on a novel parametric interval abstraction (PIA) technique, which works for systems with multiple parameters, for instance, where n and t are parameters describing the system size and the bound on the number of faulty processes, respectively. The PIA technique allows to map typical thresholdrange intervals like [1, t + 1) and [t + 1, n − t) to values from a finite abstract domain. Applying PIA to both the local states of the processes and the global system state, the parameterized verification problem can be reduced to finite-state model checking. We demonstrate the practical feasibility of our method by verifying several variants of the well-known consistent broadcasting algorithm by Srikanth and Toueg for different fault models. To the best of our knowledge, this is the first successful automated parameterized verification of a Byzantine fault-tolerant distributed algorithm for message-passing systems.
Fault-tolerant distributed algorithms are central for building reliable spatially distributed systems. Unfortunately, the lack of a canonical precise framework for fault-tolerant algorithms is an obstacle for both verification and deployment. In this paper, we introduce a new domainspecific framework to capture the behavior of fault-tolerant distributed algorithms in an adequate and precise way. At the center of our framework is a parameterized system model where control flow automata are used for process specification. To account for the specific features and properties of fault-tolerant distributed algorithms for message-passing systems, our control flow automata are extended to model threshold guards as well as the inherent non-determinism stemming from asynchronous communication, interleavings of steps, and faulty processes. We demonstrate the adequacy of our framework in a representative case study where we formalize a family of well-known fault-tolerant broadcasting algorithms under a variety of failure assumptions. Our case study is supported by model checking experiments with safety and liveness specifications for a fixed number of processes. In the experiments, we systematically varied the assumptions on both the resilience condition and the failure model. In all cases, our experiments coincided with the theoretical results predicted in the distributed algorithms literature. This is giving clear evidence for the adequacy of our model. In a companion paper [18], we are addressing the new model checking techniques necessary for parametric verification of the distributed algorithms captured in our framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.