We propose a new approach for developing and deploying distributed systems, in which nodes predict distributed consequences of their actions and use this information to detect and avoid errors. Each node continuously runs a state exploration algorithm on a recent consistent snapshot of its neighborhood and predicts possible future violations of specified safety properties. We describe a new state exploration algorithm, consequence prediction, which explores causally related chains of events that lead to property violation.This article describes the design and implementation of this approach, termed CrystalBall. We evaluate CrystalBall on RandTree, BulletPrime, Paxos, and Chord distributed system implementations. We identified new bugs in mature Mace implementations of three systems. Furthermore, we show that if the bug is not corrected during system development, CrystalBall is effective in steering the execution away from inconsistent states at runtime.
The support for transactions is an essential part of a database management system (DBMS). Without this support, the developers are burdened with ensuring atomic execution of a transaction despite failures as well as concurrent accesses to the database by other transactions. Ideally, a transactional system provides serializability, which means that the outcome of concurrent transactions is equivalent to a serial execution of them. Based on experiences on lock-based implementations, nevertheless, serializability is known as an expensive feature that comes with high overhead and low concurrency. Commercial systems, hence, compromise serializability by implementing weaker guarantees such as snapshot isolation. The developers, therefore, are still burdened with the anomalies that could arise due to the lack of serializability.There have been recent attempts to enrich large-scale data stores, such as HBase and BigTable, with transactional support. Not surprisingly, inspired by traditional database management systems, serializability is usually compromised for the benefit of efficiency. For example, Google Percolator, implements lock-based snapshot isolation on top of BigTable. We show in this paper that this compromise is not necessary in lock-free implementations of transactional support. We introduce write-snapshot isolation, a novel isolation level that has a performance comparable with that of snapshot isolation, and yet provides serializability. The main insight in write-snapshot isolation is to prevent readwrite conflicts in contrast to write-write conflicts that are prevented by snapshot isolation.
Abstract-It is notoriously hard to develop dependable distributed systems. This is partly due to the difficulties in foreseeing various corner cases and failure scenarios while implementing a system that will be deployed over an asynchronous network. In contrast, reasoning about the desired distributed system behavior and the corresponding invariants is easier than reasoning about the code itself. Further, the invariants can be used for testing, theorem proving, and runtime enforcement.In this paper, we propose an approach to observe the system behavior and automatically infer invariants which reveal implementation bugs. Using our tool, Avenger, we automatically generate a large number of potentially relevant properties, check them within the time and spatial domains using traces of system executions, and filter out all but a few properties before reporting them to the developer. Our key insight in filtering is that a good candidate for an invariant is the one that holds in all but a few cases, i.e., an "almost-invariant". Our experimental results with the XORP BGP implementation demonstrate Avenger's ability to identify the almost-invariants that lead the developer to programming errors.
Byzantine fault tolerant (BFT) protocols are replication-based solutions to the problem of tolerating the arbitrary failures of software and hardware components. The essential assumption for replication is independence of failures. In this paper, we categorize four different failure independence levels that could be obtained from the cloud. Providing more level of independence comes with the cost of more delays and less bandwidth, and not a single BFT protocol fits all these deployment setups. Using experimental results, we discuss the possible appropriate BFT protocol for each category.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.