Building efficient query engines in a high-level language

Klonatos, Yannis; Koch, Christoph; Rompf, Tiark; Chafi, Hassan

doi:10.14778/2732951.2732959

Cited by 88 publications

(78 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When transforming a batch of scan requests into a single shared scan, TellStore uses just-in-time compilation with LLVM to get highly-tuned machine code. In essence, TellStore combines the batch-oriented shared scan technique of [22] with LLVM code generation described in [33,27].…”

Section: Predicate Pushdownmentioning

confidence: 99%

Fast scans on key-value stores

et al. 2017

View full text Add to dashboard Cite

Key-Value Stores (KVS) are becoming increasingly popular because they scale up and down elastically, sustain high throughputs for get/put workloads and have low latencies. KVS owe these advantages to their simplicity. This simplicity, however, comes at a cost: It is expensive to process complex, analytical queries on top of a KVS because today's generation of KVS does not support an efficient way to scan the data. The problem is that there are conflicting goals when designing a KVS for analytical queries and for simple get/put workloads: Analytical queries require high locality and a compact representation of data whereas elastic get/put workloads require sparse indexes. This paper shows that it is possible to have it all, with reasonable compromises. We studied the KVS design space and built TellStore, a distributed KVS, that performs almost as well as state-of-the-art KVS for get/put workloads and orders of magnitude better for analytical and mixed workloads. This paper presents the results of comprehensive experiments with an extended version of the YCSB benchmark and a workload from the telecommunication industry.

show abstract

Section: Predicate Pushdownmentioning

confidence: 99%

Fast scans on key-value stores

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Another example of research done on building eficient query engines is presented in [28]. What constitutes the research particularly interesting for our work is Scala language that has been selected to implement LegoBase -a query engine being analysed by authors.…”

Section: Related Workmentioning

confidence: 99%

Extensible, Fast And Secure Scala Expression Evaluation Engine

Janik

Janusz

2017

JAMRIS

View full text Add to dashboard Cite

show abstract

“…Runtime code generation has become an established mechanism, used by several relational engines [6,34,36,43,45,49]. HIQUE [36] generates cache-conscious code via code templates.…”

Section: Related Workmentioning

confidence: 99%

“…HyPer [43] uses the LLVM compiler [37] to generate low-level machine code. LegoBase [34] goes through numerous rewriting ("staging") steps to generate C code. Proteus follows the HyPer paradigm and relies on LLVM too.…”

Section: Related Workmentioning

confidence: 99%

Fast queries over heterogeneous data through engine customization

2016

View full text Add to dashboard Cite

Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of heterogeneous datasets to gain insights. The different data models and formats pose a significant challenge on performing analysis over a combination of diverse datasets. Serving all queries using a single, general-purpose query engine is slow. On the other hand, using a specialized engine for each heterogeneous dataset increases complexity: queries touching a combination of datasets require an integration layer over the different engines.This paper presents a system design that natively supports heterogeneous data formats and also minimizes query execution times. For multi-format support, the design uses an expressive query algebra which enables operations over various data models. For minimal execution times, it uses a code generation mechanism to mimic the system and storage most appropriate to answer a query fast. We validate our design by building Proteus, a query engine which natively supports queries over CSV, JSON, and relational binary data, and which specializes itself to each query, dataset, and workload via code generation. Proteus outperforms state-of-the-art opensource and commercial systems on both synthetic and real-world workloads without being tied to a single data model or format, all while exposing users to a single query interface.

show abstract

Building efficient query engines in a high-level language

Cited by 88 publications

References 25 publications

Fast scans on key-value stores

Fast scans on key-value stores

Extensible, Fast And Secure Scala Expression Evaluation Engine

Fast queries over heterogeneous data through engine customization

Contact Info

Product

Resources

About