Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra

Kersten, Timo; Leis, Viktor; Neumann, Thomas

doi:10.1007/s00778-020-00643-4

Cited by 23 publications

(3 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Query compilers nowadays build upon data centric code generation [22,32], which translates query plans into an intermediate language that a compiler like LLVM can optimize and transform to machine code. Subsequent work in this area advanced the used intermediate representations (IR) to fit the needs of query processing systems [13,14,33,38]. Indexed Algebra optimizes the logical plan from a high level, where IRs focus on the lowering to machine code for the physical query plan.…”

Section: Related Workmentioning

confidence: 99%

Asymptotically Better Query Optimization Using Indexed Algebra

Fent,

Moerkotte,

Neumann

2023

Proc. VLDB Endow.

Self Cite

View full text Add to dashboard Cite

Query optimization is essential for the efficient execution of queries. The necessary analysis, if we can and should apply optimizations and transform the query plan, is already challenging. Traditional techniques focus on the availability of columns at individual operators, which does not scale for analysis of data flow through the query. Tracking available columns per operator takes quadratic space, which can result in multi-second optimization time for deep algebra trees. Instead, we need to re-think the naïve algebra representation to efficiently support data flow analysis. In this paper, we introduce Indexed Algebra , a novel representation of relational algebra that makes common optimization tasks efficient. Indexed Algebra enables efficient reasoning with an auxiliary index structure based on link/cut trees that support dynamic updates and queries in O (log n ). This approach not only improves the asymptotic complexity, but also allows elegant and concise formulations for the data flow questions needed for query optimization. While large queries see theoretically unbounded improvements, Indexed Algebra also improves optimization time of the relatively harmless queries of TPC-H and TPC-DS by more than 1.8×.

show abstract

Section: Related Workmentioning

confidence: 99%

Asymptotically Better Query Optimization Using Indexed Algebra

Fent,

Moerkotte,

Neumann

2023

Proc. VLDB Endow.

Self Cite

View full text Add to dashboard Cite

show abstract

“…As code generation can be relatively expensive, especially for ad-hoc queries that complete quickly, Umbra does not directly generate machine code but uses the intermediate representation Umbra IR. This low-level language that was inspired by LLVM [17] can then be executed in multiple ways: A virtual machine that interprets the IR, a direct translation from Umbra IR to x86 assembly called the Flying Start backend [15], and a more sophisticated translation that uses LLVM to generate optimized machine code. Umbra uses adaptive execution [16] to dynamically switch between all three approaches to achieve low latency for short queries and high throughput for long-running analytical queries.…”

Section: User-defined Operators In Code-generating Query Enginesmentioning

confidence: 99%

“…Proceedings of the VLDB Endowment, Vol. 15 Still, most data that is eventually analyzed in special-purpose systems is originally sourced from the RDBMS. Thus, a common approach is to create ETL work ows that can accommodate the use of di erent systems [31].…”

Section: Introductionmentioning

confidence: 99%

User-defined operators

Moritz

Neumann

2022

Proc. VLDB Endow.

Self Cite

View full text Add to dashboard Cite

In recent years, complex data mining and machine learning algorithms have become more common in data analytics. Several specialized systems exist to evaluate these algorithms on ever-growing data sets, which are built to efficiently execute different types of complex analytics queries. However, using these various systems comes at a price. Moving data out of traditional database systems is often slow as it requires exporting and importing data, which is typically performed using the relatively inefficient CSV format. Additionally, database systems usually offer strong ACID guarantees, which are lost when adding new, external systems. This disadvantage can be detrimental to the consistency of the results. Most data scientists still prefer not to use classical database systems for data analytics. The main reason why RDBMS are not used is that SQL is difficult to work with due to its declarative and set-oriented nature, and is not easily extensible. We present User-Defined Operators (UDOs) as a concept to include custom algorithms into modern query engines. Users can write idiomatic code in the programming language of their choice, which is then directly integrated into existing database systems. We show that our implementation can compete with specialized tools and existing query engines while retaining all beneficial properties of the database system.

show abstract