Caetano Sauer scite author profile

Abstract. The MapReduce (MR) framework has become a standard tool for performing large batch computations-usually of aggregative nature-in parallel over a cluster of commodity machines. A significant share of typical MR jobs involves standard database-style queries, where it becomes cumbersome to specify map and reduce functions from scratch. To overcome this burden, higher-level languages such as HiveQL, PigLatin, and JAQL have been proposed to allow the automatic generation of MR jobs from declarative queries. We identify two major problems of these existing solutions: (i) they introduce new query languages and implement systems from scratch for the sole purpose of expressing MR jobs; and (ii) despite solving some of the major limitations of SQL, they still lack the flexibility required by big data applications. We propose BrackitMR, an approach based on the XQuery language with extended JSON support. XQuery not only is an established query language, but also has a more expressive data model and more powerful language constructs, enabling a much greater degree of flexibility. From a system design perspective, we extend an existing single-node query processor, Brackit, adding MR as a distributed coordination layer. Such heavy reuse of the standard query processor not only provides performance, but also allows for a more elegant design which transparently integrates MR processing into a generic query engine.

show abstract

Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines

Haubenschild

Sauer

Neumann

et al. 2020

View full text Add to dashboard Cite

For decades, ARIES has been the standard for logging and recovery in database systems. ARIES offers important features like support for arbitrary workloads, fuzzy checkpoints, and transparent index recovery. Nevertheless, many modern inmemory database systems use more lightweight approaches that have less overhead and better multi-core scalability but only work well for the in-memory setting. Recently, a new class of high-performance storage engines has emerged, which exploit fast SSDs to achieve performance close to pure in-memory systems but also allow out-of-memory workloads. For these systems, ARIES is too slow whereas inmemory logging proposals are not applicable. In this work, we propose a new logging and recovery design that supports incremental and fuzzy checkpointing, index recovery, out-of-memory workloads, and low-latency transaction commits. Our continuous checkpointing algorithm guarantees bounded recovery time. Using per-thread logging with minimal synchronization, our implementation achieves near-linear scalability on multi-core CPUs. We implemented and evaluated these techniques in our LeanStore storage engine. For working sets that fit in main memory, we achieve performance close to that of an in-memory approach, even with logging, checkpointing, and dirty page writing enabled. For the out-of-memory scenario, we outperform a state-of-the-art ARIES implementation by a factor of two.

show abstract

Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, and Media Restore

Graefe

Guy²,

Sauer

2014

Synthesis Lectures on Data Management

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Caetano Sauer

Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, Media Restore, and System Failover, Second Edition

Compilation of Query Languages into MapReduce

Versatile XQuery Processing in MapReduce

Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines

Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, and Media Restore

Contact Info

Product

Resources

About