2017
DOI: 10.1016/j.jpdc.2017.02.007
|View full text |Cite
|
Sign up to set email alerts
|

Design and evaluation of small–large outer joins in cloud computing environments

Abstract: ilar or better performance under different outer join workloads, and thus, can be considered as a new option for current data analysis applications. Moreover, our detailed experimental results also have provided insights of current small-large outer join implementations, thereby allowing system developers to make a more informed choice for their data analysis applications.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
18
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 29 publications
(18 citation statements)
references
References 36 publications
0
18
0
Order By: Relevance
“…Most iterative applications require the intermediate results as inputs for the next iteration, resulting in wastage of time on the same set of data's I/O and shuffle . Thus, an instruction replacement method enables efficient small–large outer joins in decentralized environments; moreover, this method is easy to implement using existing predicates in data‐processing frameworks.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Most iterative applications require the intermediate results as inputs for the next iteration, resulting in wastage of time on the same set of data's I/O and shuffle . Thus, an instruction replacement method enables efficient small–large outer joins in decentralized environments; moreover, this method is easy to implement using existing predicates in data‐processing frameworks.…”
Section: Discussionmentioning
confidence: 99%
“…Cheng et al investigated a global collection of statistics, redundant computation, data backup, and network access overhead. They proposed a partial redistribution and partial query method to improve performance and create a robust join operation with large datasets in a cluster environment . This study considers a similar situation wherein the join operation is replaced for datasets of various sizes to improve performance.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In fact, these operations have been extensively studied in the field of data management, and large number of methods have been proposed to improve their performance. For example, for distributed join executions, current research focuses on the challenge on how to efficiently move data, either in the presence of different join workloads (e.g., skew) or different computing platforms (e.g., clusters and Cloud) or both [4], [5], [6], [11], [16], [20]. Their main target is either to reduce network traffic or to improve load-balancing or both, so as to balance computations and improve network communication time.…”
Section: Related Workmentioning
confidence: 99%