With the emergence of persistent memory (PM), which enables byte-addressable, 64-byte cacheline flush and 8-byte store instructions are expected to be used as persist and failure-atomic write operations, respectively, instead of calls from fsync() and write() system. The granularity of such a small atomic write represents a challenge to the crash consistency of in-PM data structure. In addition, even if the process does not explicitly invoke clflush, a dirty cachelines can be flushed to PM unexpectedly and exposed to other concurrent processes when the system recovers. These challenges occur in PM, but high-density PM is attractive in enabling multidimensional indexing to navigate efficiently through large scientific datasets due to their high performance, durability and large capacity.This work proposes a Fail-atomic Byte Addressable R-tree (FBR tree) that utilizes byteaddressability, persistence and high performance of PM while ensuring crash consistency. We carefully control the order of the store and cashline flush instruction and prevent any sinble store instruction from making the FBR tree inconsistent and unrecoverable. We also develop a non-blocking lock-free range query algorithm for the proposed FBR-tree. Since FBR-tree allows read transactions to detect and ignore any transient inconsistent states, multiple read transactions can access tree nodes concurrently without using shared locks while other write transactions make changes to them. Our performance study shows that FBR-tree successfully reduces legacy logging overhead, and the lock-free range query algorithm shows up to 9.4x higher query processing throughputs than the shared lock-based crabbing concurrency protocol.
Over the past few years, various indexes have been redesigned for byte-addressable persistent memory. In this work, we design and implement PB+tree (Pivotal B+tree) that resolves the limitations of state-of-the-art fully persistent B+trees. First, PB+tree reduces the number of expensive shift operations by up to half by managing two sub-arrays separated by a pivot key. Second, PB+tree reads cachelines in ascending order, which makes PB+tree benefit from hardware prefetchers and run faster than state-of-the-art persistent B+trees that access cachelines in non-contiguous or descending order. Third, PB+tree employs an optimistic lock-free search algorithm to avoid repeatedly visiting the same tree node. Although the optimistic lock-free search algorithm involves a risk of visiting incorrect child nodes, PB+tree guarantees correct search results using the lazy correction algorithm using doubly linked sibling pointers. Our performance study shows that PB+tree outperforms the state-of-the-art fully persistent indexes by a large margin. A search algorithm without optimistic locking risks visiting the wrong child node, but PB+tree uses a lazy correction algorithm with doubly linked sibling pointers to ensure correct search results. Our performance studies show that PB+trees outperform state-of-the-art fully persistent indexes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.