Increased capacity of main memory has led to the rise of in-memory databases. With disk access eliminated, efficiency of index structures has become critical for performance in these systems. An ideal index structure should exhibit high performance for a wide variety of workloads, be scalable, and efficient in handling large data sets. Unfortunately, our evaluation shows that most state-of-the-art index structures fail to meet these three goals. For an index to be performant with large data sets, it should ideally have time complexity independent of the key set size. To ensure scalability, critical sections should be minimized and synchronization mechanisms carefully designed to reduce cache coherence traffic. Moreover, complex memory hierarchy in servers makes data placement and memory access patterns important for high performance across all workload types.
In this paper, we present HydraList, a new concurrent, scalable, and high performance in-memory index structure for massive multi-core machines. The key insight behind our design of HydraList is that an index structure can be divided into two components (search and data layers) which can be updated independently leading to lower synchronization overhead. By isolating the search layer, we are able to replicate it across NUMA nodes and reduce cache misses and remote memory accesses. As a result, our evaluation shows that HydraList outperforms other index structures especially in a variety of workloads and key types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.