Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data 2015
DOI: 10.1145/2723372.2723718
|View full text |Cite
|
Sign up to set email alerts
|

Locality-aware Partitioning in Parallel Database Systems

Abstract: Parallel database systems horizontally partition large amounts of structured data in order to provide parallel data processing capabilities for analytical workloads in sharednothing clusters. One major challenge when horizontally partitioning large amounts of data is to reduce the network costs for a given workload and a database schema. A common technique to reduce the network costs in parallel database systems is to co-partition tables on their join key in order to avoid expensive remote join operations. How… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 64 publications
(27 citation statements)
references
References 17 publications
0
27
0
Order By: Relevance
“…Baselines. We first compared the partitionings found by our approaches to heuristics that are typically used by a database administrator [25]. For both simple and more complex star schemata (SSB and TPC-DS) this means that usually fact tables are co-partitioned with either the most frequently joined dimension table (Heuristic 1) or the largest dimension table (Heuristic 2).…”
Section: Workloads Setup and Baselinesmentioning
confidence: 99%
See 1 more Smart Citation
“…Baselines. We first compared the partitionings found by our approaches to heuristics that are typically used by a database administrator [25]. For both simple and more complex star schemata (SSB and TPC-DS) this means that usually fact tables are co-partitioned with either the most frequently joined dimension table (Heuristic 1) or the largest dimension table (Heuristic 2).…”
Section: Workloads Setup and Baselinesmentioning
confidence: 99%
“…Since this technique can be exploited if a system supports hash-partitioning by any attribute most partitioning advisors and also our technique indirectly make use of REF-partitioning. Zamanian et al [25] extend this approach such that even more locality can be obtained but at the cost of higher replication. For this, the database has to support predicatebased reference partitioning.…”
Section: Related Workmentioning
confidence: 99%
“…To maximize data-locality, different data partitioning techniques have been proposed to avoid remote join operations for queries [7]. More generally, various advanced data placement and replication strategies have been proposed for data center storage systems to reduce the network overhead for particular workloads [8,9].…”
Section: Related Workmentioning
confidence: 99%
“…Data partitioning is a principal factor in query optimization and processing [17]. It allows access to a subset of data if and when possible, which can improve the overall performance considerably by reducing I/O cost, boosting system throughput, increasing query parallelism, maximizing locality of joins and aggregations [29], and giving the opportunity for finer locking granularity [15].…”
Section: Introductionmentioning
confidence: 99%