Worst-case optimal join algorithms are attractive from a theoretical point of view, as they offer asymptotically better runtime than binary joins on certain types of queries. In particular, they avoid enumerating large intermediate results by processing multiple input relations in a single multiway join. However, existing implementations incur a sizable overhead in practice, primarily since they rely on suitable ordered index structures on their input. Systems that support worst-case optimal joins often focus on a specific problem domain, such as read-only graph analytic queries, where extensive precomputation allows them to mask these costs.In this paper, we present a comprehensive implementation approach for worst-case optimal joins that is practical within general-purpose relational database management systems supporting both hybrid transactional and analytical workloads. The key component of our approach is a novel hash-based worst-case optimal join algorithm that relies only on data structures that can be built efficiently during query execution. Furthermore, we implement a hybrid query optimizer that intelligently and transparently combines both binary and multi-way joins within the same query plan. We demonstrate that our approach far outperforms existing systems when worst-case optimal joins are beneficial while sacrificing no performance when they are not.
Pure in-memory database systems offer outstanding performance but degrade heavily if the working set does not fit into DRAM, which is problematic in view of declining main memory growth rates. In contrast, recently proposed memory-optimized disk-based systems such as Umbra leverage large in-memory buffers for query processing but rely on fast solid-state disks for persistent storage. They offer near in-memory performance while the working set is cached, and scale gracefully to arbitrarily large data sets far beyond main memory capacity. Past research has shown that this architecture is indeed feasible for read-heavy analytical workloads. We continue this line of work in the following paper, and present a novel multi-version concurrency control approach that enables a memory-optimized disk-based system to achieve excellent performance on transactional workloads as well. Our approach exploits that the vast majority of versioning information can be maintained entirely in-memory without ever being persisted to stable storage, which minimizes the overhead of concurrency control. Large write transactions for which this is not possible are extremely rare, and handled transparently by a lightweight fallback mechanism. Our experiments show that the proposed approach achieves transaction throughput up to an order of magnitude higher than competing disk-based systems, confirming its viability in a real-world setting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.