Emerging hardware platforms are characterized by large degrees of parallelism, complex memory hierarchies, and increasing hardware heterogeneity. Their theoretical peak data processing performance can only be unleashed if the different pieces of systems software collaborate much more closely and if their traditional dependencies and interfaces are redesigned.
We have developed the key concepts and a prototype implementation of a novel system software stack named mxkernel. For MxKernel, efficient large scale data processing capabilities are a primary design goal. To achieve this, heterogeneity and parallelism become first-class citizens and deep memory hierarchies are considered from the very beginning. Instead of a classical “thread” model, mxkernel provides a simpler control flow abstraction: mxtasks model closed units of work, for which mxkernel will guarantee the required execution semantics, such exclusive access to a specific object in memory. They can be a very elegant abstraction also for heterogeneity and resource sharing. Furthermore, mxtasks are annotated with metadata, such as code variants (to support heterogeneity), memory access behavior (to improve cache efficiency and support memory hierarchies), or dependencies between mxtasks (to improve scheduling and avoid synchronization cost). With precisely the required metadata available, mxkernel can provide a lightweight, yet highly efficient form of resource management, even across applications, operating system, and database.
Based on the mxkernel prototype we present preliminary results from this ambitious undertaking. We argue that threads are an ill-suited control flow abstraction for our modern computer architectures and that a task-based execution model is to be favored.
Query compilation is a processing technique that achieves very high processing speeds but has the disadvantage of introducing additional compilation latencies. These latencies cause an overhead that is relatively high for short-running and high-complexity queries. In this work, we present Flounder IR and ReSQL, our new approach to query compilation. Instead of using a general purpose intermediate representation (e.g., LLVM IR) during compilation, ReSQL uses Flounder IR, which is specifically designed for database processing. Flounder IR is lightweight and close to machine assembly. This simplifies the translation from IR to machine code, which otherwise is a costly translation step. Despite simple translation, compiled queries still benefit from the high processing speeds of the query compilation technique. We analyze the performance of our approach with micro-benchmarks and with ReSQL, which employs a full translation stack from SQL to machine code. We show reductions in compilation times up to two orders of magnitude over LLVM and show improvements in overall execution time for TPC-H queries up to 5.5
$$\times $$
×
over state-of-the-art systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.