Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data 2015
DOI: 10.1145/2723372.2750545
|View full text |Cite
|
Sign up to set email alerts
|

From Theory to Practice

Abstract: Big data analytics often requires processing complex queries using massive parallelism, where the main performance metrics is the communication cost incurred during data reshuffling. In this paper, we describe a system that can compute efficiently complex join queries, including queries with cyclic joins, on a massively parallel architecture. We build on two independent lines of work for multi-join query evaluation: a communication-optimal algorithm for distributed evaluation, and a worst-case optimal algorith… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 77 publications
(9 citation statements)
references
References 34 publications
0
9
0
Order By: Relevance
“…The hypercube hash join was initially presented in [20] and was used in the distributed RDF store presented in [28]. The basic idea is that for each join variable one dimension is created.…”
Section: Decentralized Joinmentioning
confidence: 99%
See 1 more Smart Citation
“…The hypercube hash join was initially presented in [20] and was used in the distributed RDF store presented in [28]. The basic idea is that for each join variable one dimension is created.…”
Section: Decentralized Joinmentioning
confidence: 99%
“…Semantic Publishing Benchmark (SPB). The SPB 28 [74] is a benchmark motivated by the industry. The use case is a publisher organization that provides metadata about its published work.…”
Section: Benchmarksmentioning
confidence: 99%
“…This paper introduces multi-way joins in Squall (a multi-way join uses a single communication step, that is, it runs within a single component). These joins can outperform the corresponding pipelines of 2-way joins as they avoid shuffling intermediate data, which can be very large [8,74,26]. Multi-way joins are especially beneficial when the output of intermediate stages is big compared to the size of the base relations and/or final output.…”
Section: Novel Join Operatorsmentioning
confidence: 99%
“…We refer an interested reader to [18]. Unfortunately, as explained in [26], both works [8,18] do not handle the case when dimension sizes (obtained from solving the equations) are not integers. For instance, if we have 7 machines in total and 3 dimensions of the same size, each dimension is of size 7 1/3 = 1.91.…”
Section: Multi-way Joins: General Casementioning
confidence: 99%
“…This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation, one of the core tasks in massively parallel systems. Several papers [7,8,20,19] have studied the tradeoff between synchronization (number of rounds) and communication cost, and have proposed and analyzed known and new parallel algorithms [4,9]. Among these, the Hypercube algorithm [13,4] can compute any multiway join query in one round by properly distributing the input data.…”
Section: Introductionmentioning
confidence: 99%