2014
DOI: 10.1007/978-3-319-09873-9_22
|View full text |Cite
|
Sign up to set email alerts
|

Robust and Efficient Large-Large Table Outer Joins on Distributed Infrastructures

Abstract: The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for personal research or study, educational, or not-for-prot purposes provided that:• a full bibliographic reference is made to the original source • a link is made to the metadata record in DRO • the full-text is not changed in any way The full-text must not be sold in any format or medium without the formal permission of the copyright holders.Please consult the full DRO policy … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
3
2

Relationship

6
3

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 21 publications
0
12
0
Order By: Relevance
“…We will investigate extensions to our design through the application of methods for skew handling [15,5,6,7], index size reduction [14] and incremental sorting [12,13] which should further improve performance. Our long term goal is to develop a highly scalable distributed analysis framework for extreme-scale RDF data.…”
Section: Resultsmentioning
confidence: 99%
“…We will investigate extensions to our design through the application of methods for skew handling [15,5,6,7], index size reduction [14] and incremental sorting [12,13] which should further improve performance. Our long term goal is to develop a highly scalable distributed analysis framework for extreme-scale RDF data.…”
Section: Resultsmentioning
confidence: 99%
“…(2) Cardinality: To see how the performance changes with increasing dataset size, similar as the evaluation works on joins [13] [14], we just fix the cardinality of relation R, to 25 million, and varying the |S| from 25 million to 400 million, a number which is extreme big for the available annotation pairs. As the skew handling is beyond the scope of this work, we just keep the data uniform distributed based on their first join key a as stated previously.…”
Section: A Benchmark Scenariosmentioning
confidence: 99%
“…Compared to these, in our previous work [7,8,9], we have employed the semijoin-alike pattern with full parallelism as a new distributed geography (namely not just a simple join operation) for handling data skew and apply it for parallel inner joins and outer joins directly. In this work, we focus on the inner joins (namely joins).…”
Section: Related Workmentioning
confidence: 99%
“…We conclude our analysis with the presentation of speedup using the very popular Hash algorithm as a baseline 9 , by analyzing the performance improvement achieved for joins in each algorithm for different numbers of nodes. Figure 9 presents the speedup ratio of PRPD, PRPQ and the Query algorithm over the basic hash method with increasing number of nodes from 2 (24 cores) to 16 and for skew values 1 and 1.4 respectively.…”
Section: Comparison With Hash-based Joinsmentioning
confidence: 99%