Proceedings of the 2021 International Conference on Management of Data 2021
DOI: 10.1145/3448016.3452831
|View full text |Cite
|
Sign up to set email alerts
|

To Partition, or Not to Partition, That is the Join Question in a Real System

Abstract: An efficient implementation of a hash join has been a highly researched problem for decades. Recently, the radix join has been shown to have superior performance over the alternatives (e.g., the non-partitioned hash join), albeit on synthetic microbenchmarks. Therefore, it is unclear whether one can simply replace the hash join in an RDBMS or use the radix join as a performance booster for selected queries. If the latter, it is still unknown when one should rely on the radix join to improve performance.In this… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(7 citation statements)
references
References 40 publications
0
5
0
Order By: Relevance
“…In Fig. 23a, we place a preaggregation at 3 , where it calculates the initial aggregates, while Γ merges and finalizes them. Figure 23b shows an additional preaggregation placed in between at 2 , which merges the aggregates from its input.…”
Section: Finalization Computing An Expression On Aggregationsmentioning
confidence: 99%
See 1 more Smart Citation
“…In Fig. 23a, we place a preaggregation at 3 , where it calculates the initial aggregates, while Γ merges and finalizes them. Figure 23b shows an additional preaggregation placed in between at 2 , which merges the aggregates from its input.…”
Section: Finalization Computing An Expression On Aggregationsmentioning
confidence: 99%
“…Re-using hash partitions, and even whole hash tables is a well-known optimization [18,36]. One often discussed question is, if hash tables should be partitioned or nonpartitioned [3]. Our proposed approaches in Sect.…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, there is a large body of related work that optimizes hash joins [39,49,55] and hash aggregations [38,52,61]. One often discussed question is, if hash tables should be partitioned or non-partitioned [3]. Our proposed approaches in Section 3 try to use a non-partitioned hash table to avoid materializing data, while using thread-local partitioning for heavy-hitters.…”
Section: Related Workmentioning
confidence: 99%
“…Order Benchmark (JOB)3 : Since IMDb primarily stores facts as strings, we extract a separate table that contains the vote count and the user rating for movies, to allow statistics collection. On these columns, we define five additional aggregation queries that calculate statistics on the new numerical columns.…”
mentioning
confidence: 99%
“…This is an important problem because it occurs in every large database. The ideal solution would allow, despite the increase of the number of records in the tables, to perform operations on the database as quickly as at the time of its implementation [2].…”
Section: Introductionmentioning
confidence: 99%