Skew-insensitive parallel algorithms for relational join

Alsabti, Khaled; Ranka, Sanjay

doi:10.1109/hipc.1998.738010

Cited by 3 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In static algorithms it is assumed that adequate information on skewed data is known before the application of the algorithm. [1], [4] and [11] expose static algorithms. On the contrary, [2], [6] and [12] propose techniques and algorithms according to which data skew is detected and encountered dynamically at run time.…”

Section: Related Workmentioning

confidence: 99%

“…Using the notion of the splitting values stored in a split vector, virtual processor partitioning [4] assigns multiple range partitions instead of one to each processor. Finally, authors in [1] assign a work weight function to each join attribute value in order to generate partitions of nearly equal weight.…”

Section: Related Workmentioning

confidence: 99%

“…In this paper we address the issue of join product skew. Various techniques and algorithms have been proposed in the literature to handle this type of skew ( [1], [4], [11], [2], [6], [12]). We introduce the notion of frequency classes, whose definition is based on the product of frequencies of the join attribute values.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A New Framework for Join Product Skew

Afrati,

Kyritsis,

Lekeas

et al. 2010

Preprint

View full text Add to dashboard Cite

Different types of data skew can result in load imbalance in the context of parallel joins under the shared nothing architecture. We study one important type of skew, join product skew (JPS). A static approach based on frequency classes is proposed which takes for granted the data distribution of join attribute values. It comes from the observation that the join selectivity can be expressed as a sum of products of frequencies of the join attribute values. As a consequence, an appropriate assignment of join sub-tasks, that takes into consideration the magnitude of the frequency products can alleviate the join product skew. Motivated by the aforementioned remark, we propose an algorithm, called Handling Join Product Skew (HJPS), to handle join product skew.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A New Framework for Join Product Skew

Afrati,

Kyritsis,

Lekeas

et al. 2010

Preprint

View full text Add to dashboard Cite

show abstract

Resource Discovery

Afrati¹,

Kyritsis²,

Lekeas³

et al. 2012

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Handling data skew in parallel joins in shared-nothing systems

Kostamaa

Zhou

et al. 2008

Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Parallel processing continues to be important in large data warehouses. The processing requirements continue to expand in multiple dimensions. These include greater volumes, increasing number of concurrent users, more complex queries, and more applications which define complex logical, semantic, and physical data models. Shared nothing parallel database management systems [16] can scale up "horizontally" by adding more nodes. Most parallel algorithms, however, do not take into account data skew. Data skew occurs naturally in many applications. A query processing skewed data not only slows down its response time, but generates hot nodes, which become a bottleneck throttling the overall system performance. Motivated by real business problems, we propose a new join geography called PRPD (Partial Redistribution & Partial Duplication) to improve the performance and scalability of parallel joins in the presence of data skew in a shared-nothing system. Our experimental results show that PRPD significantly speeds up query elapsed time in the presence of data skew. Our experience shows that eliminating system bottlenecks caused by data skew improves the throughput of the whole system which is important in parallel data warehouses that often run high concurrency workloads.

show abstract

Skew-insensitive parallel algorithms for relational join

Cited by 3 publications

References 19 publications

A New Framework for Join Product Skew

A New Framework for Join Product Skew

Resource Discovery

Handling data skew in parallel joins in shared-nothing systems

Contact Info

Product

Resources

About