Locality-aware Partitioning in Parallel Database Systems

Zamanian, Erfan; Binnig, Carsten; Salama, Abdallah

doi:10.1145/2723372.2723718

Cited by 64 publications

(27 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baselines. We first compared the partitionings found by our approaches to heuristics that are typically used by a database administrator [25]. For both simple and more complex star schemata (SSB and TPC-DS) this means that usually fact tables are co-partitioned with either the most frequently joined dimension table (Heuristic 1) or the largest dimension table (Heuristic 2).…”

Section: Workloads Setup and Baselinesmentioning

confidence: 99%

“…Since this technique can be exploited if a system supports hash-partitioning by any attribute most partitioning advisors and also our technique indirectly make use of REF-partitioning. Zamanian et al [25] extend this approach such that even more locality can be obtained but at the cost of higher replication. For this, the database has to support predicatebased reference partitioning.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Towards learning a partitioning advisor with deep reinforcement learning

Hilprecht

Binnig

Röhm

2019

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Self Cite

View full text Add to dashboard Cite

Commercial data analytics products such as Microsoft Azure SQL Data Warehouse or Amazon Redshift provide ready-touse scale-out database solutions for OLAP-style workloads in the cloud. While the provisioning of a database cluster is usually fully automated by cloud providers, customers typically still have to make important design decisions which were traditionally made by the database administrator such as selecting the partitioning schemes. In this paper we introduce a learned partitioning advisor for analytical OLAP-style workloads based on Deep Reinforcement Learning (DRL). The main idea is that a DRL agent learns its decisions based on experience by monitoring the rewards for different workloads and partitioning schemes. We evaluate our learned partitioning advisor in an experimental evaluation with different databases schemata and workloads of varying complexity. In the evaluation, we show that our advisor is not only able to find partitionings that outperform existing approaches for automated partitioning design but that it also can easily adjust to different deployments. This is especially important in cloud setups where customers can easily migrate their cluster to a new set of (virtual) machines.

show abstract

Section: Workloads Setup and Baselinesmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Towards learning a partitioning advisor with deep reinforcement learning

Hilprecht

Binnig

Röhm

2019

Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management

Self Cite

View full text Add to dashboard Cite

show abstract

“…To maximize data-locality, different data partitioning techniques have been proposed to avoid remote join operations for queries [7]. More generally, various advanced data placement and replication strategies have been proposed for data center storage systems to reduce the network overhead for particular workloads [8,9].…”

Section: Related Workmentioning

confidence: 99%

Euro-Par 2018: Parallel Processing

Aldinucci¹,

Padovani²,

Torquati³

2018

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Large computing systems such as data centers are becoming the mainstream infrastructures for big data processing. As one of the key data operators in such scenarios, distributed joins is still challenging current techniques since it always incurs a significant cost on network communication. Various advanced approaches have been proposed to improve the performance, however, most of them just focus on data skew handling, and algorithms designed specifically for communication reduction have received less attention. Moreover, although the state-of-the-art technique can minimize network traffic, it provides fine-grained optimal schedules for all individual join keys, which could result in obvious overhead. In this paper, we propose a new approach called LAS (Lightweight Locality-Aware Scheduling), which targets reducing network communication for large distributed joins in an efficient and effective manner. We present the detailed design and implementation of LAS, and conduct an experimental evaluation using large data joins. Our results show that LAS can effectively reduce scheduling overhead and achieve comparable performance on network reduction compared to the state-of-the-art.

show abstract

“…Data partitioning is a principal factor in query optimization and processing [17]. It allows access to a subset of data if and when possible, which can improve the overall performance considerably by reducing I/O cost, boosting system throughput, increasing query parallelism, maximizing locality of joins and aggregations [29], and giving the opportunity for finer locking granularity [15].…”

Section: Introductionmentioning

confidence: 99%

Hybrid row-column partitioning in teradata _®

Al-Kateb

Sinclair

et al. 2016

Proc. VLDB Endow.

View full text Add to dashboard Cite

Data partitioning is an indispensable ingredient of database systems due to the performance improvement it can bring to any given mixed workload. Data can be partitioned horizontally or vertically. While some commercial proprietary and open source database systems have one flavor or mixed flavors of these partitioning forms, Teradata Database offers a unique hybrid row-column store solution that seamlessly combines both of these partitioning schemes. The key feature of this hybrid solution is that either row, column, or combined partitions are all stored and handled in the same way internally by the underlying file system storage layer. In this paper, we present the main characteristics and explain the implementation approach of Teradata's row-column store. We also discuss query optimization techniques applicable specifically to partitioned tables. Furthermore, we present a performance study that demonstrates how different partitioning options impact the performance of various queries.

show abstract

Locality-aware Partitioning in Parallel Database Systems

Cited by 64 publications

References 17 publications

Towards learning a partitioning advisor with deep reinforcement learning

Towards learning a partitioning advisor with deep reinforcement learning

Euro-Par 2018: Parallel Processing

Hybrid row-column partitioning in teradata _®

Contact Info

Product

Resources

About

Locality-aware Partitioning in Parallel Database Systems

Cited by 64 publications

References 17 publications

Towards learning a partitioning advisor with deep reinforcement learning

Towards learning a partitioning advisor with deep reinforcement learning

Euro-Par 2018: Parallel Processing

Hybrid row-column partitioning in teradata ®

Contact Info

Product

Resources

About

Hybrid row-column partitioning in teradata _®