2014
DOI: 10.1016/j.procs.2014.05.014
|View full text |Cite
|
Sign up to set email alerts
|

Handling Data-skew Effects in Join Operations Using MapReduce

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
24
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(24 citation statements)
references
References 10 publications
0
24
0
Order By: Relevance
“…The algorithm optimizes multiway join by minimizing duplicate tuples of input data and can detect whether an attribute is incorrectly contained in the keys of map side and then fix it. Hassan et al proposed a new frequency adaptive algorithm called MRFA‐Join for join operations of large‐scale data sets based on a MapReduce programming model. To solve the problem of buffering all records of internal and external relations, Blans et al proposed a semi‐join algorithm for log processing and improve the sorting and merging operations in MapReduce.…”
Section: Related Workmentioning
confidence: 99%
“…The algorithm optimizes multiway join by minimizing duplicate tuples of input data and can detect whether an attribute is incorrectly contained in the keys of map side and then fix it. Hassan et al proposed a new frequency adaptive algorithm called MRFA‐Join for join operations of large‐scale data sets based on a MapReduce programming model. To solve the problem of buffering all records of internal and external relations, Blans et al proposed a semi‐join algorithm for log processing and improve the sorting and merging operations in MapReduce.…”
Section: Related Workmentioning
confidence: 99%
“…Regarding joins in a cloud computing environment, most of current works focus on proposing novel data skew handling techniques to improve the loadbalancing and scalability of join implementations in the presence of big data (e.g., SALA [35], SkewTune [36] and the approaches presented in [37], [38]), as opposed to the detailed implementation and evaluation of joins that is studied in this work. Moreover, the large scale data-analytics community has developed its own set of parallel processing paradigms and related join operations.…”
Section: Related Workmentioning
confidence: 99%
“…For a highly skewed join attribute value K, appropriate map keys are generated so that all records in each bucket associated to value K in one relation are forwarded to the same reducer holding all the corresponding buckets of other relation. This partitioning guarantees that join tasks, are generated in a manner that the input data for each join task will fit in the memory of processing node and never exceed a user defined size, even for highly skewed data [10].…”
Section: End Ifmentioning
confidence: 99%
“…Business intelligence and large-scale data analysis have been recently the object of increased research activity using MapReduce model and especially in the evaluation of complex queries involving GroupBy-Joins using hash based approach [2,10,14]. GroupBy-joins still suffer from the effect of high redistribution cost, disk I/O and task imbalance in the presence of skewed data in large scale systems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation