Just-In-Time Data Distribution for Analytical Query Processing

Ivanova, Milena; Kersten, Martin; Groffen, Fabian

doi:10.1007/978-3-642-33074-2_16

Cited by 3 publications

(4 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this configuration, we use the Mitosis and Dataflow optimizers of MonetDB to achieve efficient intraoperator parallelism [25]. This confiugration demonstrates the performance that is achievable by hand-tuning operators for a multi-core CPU.…”

Section: Parallel Monetdbmentioning

confidence: 93%

Hardware-oblivious parallelism for in-memory column-stores

Heimel

Saecker²,

Pirk

et al. 2013

Proc. VLDB Endow.

View full text Add to dashboard Cite

The multi-core architectures of today's computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on laborintensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We propose an alternative design for a parallel database engine, based on a single set of hardware-oblivious operators, which are compiled down to the actual hardware at runtime. This design reduces the development overhead for parallel database engines, while achieving competitive performance to hand-tuned systems.We provide a proof-of-concept for this design by integrating operators written using the parallel programming framework OpenCL into the open-source database MonetDB. Following this approach, we achieve efficient, yet highly portable parallel code without the need for optimization by hand. We evaluated our implementation against MonetDB using TPC-H derived queries and observed a performance that rivals that of MonetDB's query execution on the CPU and surpasses it on the GPU. In addition, we show that the same set of operators runs nearly unchanged on a GPU, demonstrating the feasibility of our approach.

show abstract

Section: Parallel Monetdbmentioning

confidence: 93%

Hardware-oblivious parallelism for in-memory column-stores

Heimel

Saecker²,

Pirk

et al. 2013

Proc. VLDB Endow.

View full text Add to dashboard Cite

show abstract

“…Note that there is no guarantee that intermediate objects may actually fit within the amount of DRAM available. However, the sizes of intermediate data sets can be controlled by fragmenting the query execution plan [23]. We will demonstrate that this allows us to make excellent use of available DRAM.…”

Section: Frequently Accessed Objectsmentioning

confidence: 99%

“…Figure 3 also shows the impact of query plan fragmentation. Query plan fragmentation is a technique to enlarge the degree of parallelism within the query plan [23]. It breaks down the columns in smaller fragments, resulting in more operations in the query plan that are independent of one another.…”

Section: Operators Causing Main Memory Accessesmentioning

confidence: 99%

Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems

Hassan¹,

Nikolopoulos

Vandierendonck

2019

IEEE Trans. Comput.

View full text Add to dashboard Cite

This paper studies the problem of efficiently utilizing hybrid memory systems, consisting of both Dynamic Random Access Memory (DRAM) and novel Non-Volatile Memory (NVM) in database management systems (DBMS) for online analytical processing (OLAP) workloads. We present a methodology to determine the database operators that are responsible for most main memory accesses. Our analysis uses both cost models and empirical measurements. We develop heuristic decision procedures to allocate data in hybrid memory at the time that the data buffers are allocated, depending on the expected memory access frequency. We implement these heuristics in the MonetDB column-oriented database and demonstrate performance improvement and energy-efficiency as compared to state-of-the-art application-agnostic hybrid memory management techniques.Index Terms-Non-volatile memory, hybrid main memory, database management system !

show abstract

“…In addition to common strategic optimizations, MonetDB has a set of optimizers for parallel query plan generation [37]. The mitosis optimizer partitions the largest input columns into several separate columns based on size estimation heuristics and the available amount of CPU cores and main memory.…”

Section: Parallel Query Plansmentioning

confidence: 99%

Vectorized UDFs in Column-Stores

Raasveldt

Mühleisen

2016

Proceedings of the 28th International Conference on Scientific and Statistical Database Management

View full text Add to dashboard Cite

An ever increasing amount of data is gathered by companies, government entities and individuals. Analysing and making sense of that data is a crucial task. A task that is getting more important as there is a shift towards more data-driven decision making in society.The tools used by data scientists for ad-hoc data analysis are scripting languages, of which R and Python are the most popular. Scripting languages are flexible, easy to use and have a large existing code base for data analytics.Relational Database Management Systems (RDBMS) are the de-facto standard for storing tabular data. RDBMS have numerous advantages when storing tabular data, amongst which are ACID properties, scalability, data validation and automatic parallelization.If the user wants to use data stored in a RDBMS in a scripting language, the data has to be transferred from the RDBMS to the scripting language. The standard solution is a loosely coupled approach between the scripting language and the database using an ODBC connector. To transfer the data to the scripting language, the data is exported from the database and copied, often over a network connection, and converted between the differing formats of the database and the scripting language.This loose-coupling approach has significant performance implications, especially when transferring data over the network. In addition, the lack of a tight integration has lead to data management features being re-implemented from scratch within scripting languages, by libraries such as Pandas and Dplyr.The main contribution of this thesis is research towards how a scripting language can be tightly integrated into a columnar data management system. We present a new system, MonetDB/Python, which deeply integrates the Python scripting language into MonetDB, an open-source relational column store.By using this system, users can execute arbitrary python functions as part of relational SQL queries inside the database process. This significantly reduces the cost of data transfer, and allows for automatic parallelization of scripting language functions.We show that our method is not only faster than current RDBMS connectors, but also that it is faster than native storage solutions in Python. MonetDB/Python allows us to combine the scalability and power of a RDBMS with the flexibility of a scripting language without the drawback of slow transfer speed.

show abstract

Just-In-Time Data Distribution for Analytical Query Processing

Cited by 3 publications

References 16 publications

Hardware-oblivious parallelism for in-memory column-stores

Hardware-oblivious parallelism for in-memory column-stores

Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems

Vectorized UDFs in Column-Stores

Contact Info

Product

Resources

About