Database replication is widely used for fault tolerance and performance. However, it requires replica control to keep data copies consistent despite updates. The traditional correctness criterion for the concurrent execution of transactions in a replicated database is 1-copy-serializability. It is based on serializability, the strongest isolation level in a nonreplicated system. In recent years, however, Snapshot Isolation (SI), a slightly weaker isolation level, has become popular in commercial database systems. There exist already several replica control protocols that provide SI in a replicated system. However, most of the correctness reasoning for these protocols has been rather informal. Additionally, most of the work so far ignores the issue of integrity constraints. In this article, we provide a formal definition of 1-copy-SI using and extending a well-established definition of SI in a nonreplicated system. Our definition considers integrity constraints in a way that conforms to the way integrity constraints are handled in commercial systems. We discuss a set of necessary and sufficient conditions for a replicated history to be producible under 1-copy-SI. This makes our formalism a convenient tool to prove the correctness of replica control algorithms.
Database replication protocols need to detect, block or abort part of conflicting transactions. A possible solution is to check their writesets (and also their readsets in case a serialisable isolation level is requested), which however burdens the consumption of CPU time. This gets even worse when the replication support is provided by a middleware, since there is no direct DBMS support in that layer. We propose and discuss the use of the concurrency control support of the local DBMS for detecting conflicts between local transactions and writesets of remote transactions. This allows to simplify many database replication protocols and to enhance their performance.
Abstract. We describe a recovery protocol which boosts availability, fault tolerance and performance by enabling failed network nodes to resume an active role immediately after they start recovering. The protocol is designed to work in tandem with middleware-based eager update-everywhere strategies and related group communication systems. The latter provide view synchrony, i.e., knowledge about currently reachable nodes and about the status of messages delivered by faulty and alive nodes. That enables a fast replay of missed updates which defines dynamic database recovery partition. Thus, speeding up the recovery of failed nodes which, together with the rest of the network, may seamlessly continue to process transactions even before their recovery has completed. We specify the protocol in terms of the procedures executed with every message and event of interest and outline a correctness proof.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.