2016 IEEE 32nd International Conference on Data Engineering (ICDE) 2016
DOI: 10.1109/icde.2016.7498250
|View full text |Cite
|
Sign up to set email alerts
|

Load balancing and skew resilience for parallel joins

Abstract: Abstract-We address the problem of load balancing for parallel joins. We show that the distribution of input data received and the output data produced by worker machines are both important for performance. As a result, previous work, which optimizes either for input or output, stands ineffective for load balancing. To that end, we propose a multi-stage load-balancing algorithm which considers the properties of both input and output data through sampling of the original join matrix. To do this efficiently, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
9
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(9 citation statements)
references
References 28 publications
(28 reference statements)
0
9
0
Order By: Relevance
“…In data stream processing, DYNAMIC [4] supports adaptive repartitioning according to the change of data streams. To ensure the load balancing and skew resilience, Aleksandar el.al [6] proposed a multi-stage load-balancing algorithm by using a novel category of equi-weight histograms. However, [4,6] assumes that the number of partitions must be 2 n .…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In data stream processing, DYNAMIC [4] supports adaptive repartitioning according to the change of data streams. To ensure the load balancing and skew resilience, Aleksandar el.al [6] proposed a multi-stage load-balancing algorithm by using a novel category of equi-weight histograms. However, [4,6] assumes that the number of partitions must be 2 n .…”
Section: Related Workmentioning
confidence: 99%
“…To ensure the load balancing and skew resilience, Aleksandar el.al [6] proposed a multi-stage load-balancing algorithm by using a novel category of equi-weight histograms. However, [4,6] assumes that the number of partitions must be 2 n . So, the matrix structure suffers from bad flexibility.…”
Section: Related Workmentioning
confidence: 99%
“…Koumarelas et al deal with the issue of preprocessing the JM, when selectivity information is known a‐priori. Victorovic et al improves on the previous study for a specific form of JMs, termed as monotonic. Beame et al, Zhang et al, and Cu et al investigate the case where multiple relations are joined in a single step.…”
Section: Related Workmentioning
confidence: 89%
“…Input partitioning was also considered for the more general problem of distributed theta-join computation. Vitorovic et al [38] propose a new tiling algorithm to partition the join matrix in a balanced way, improving over earlier work by Okcan and Riedewald [27]. However, the authors themselves point out that for equi-joins one should instead rely on a specialized solution such as [2], because general theta-join approaches do not exploit the strong structural properties of key-equality based matching in equi-joins.…”
Section: Related Workmentioning
confidence: 99%