2017
DOI: 10.14778/3137628.3137656
|View full text |Cite
|
Sign up to set email alerts
|

Runtime optimization of join location in parallel data management systems

Abstract: Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both may be suboptimal: reduce side joins may cause skew, while map side joins may lead to a lot of data being transferred a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…Health Cao et al. (2010) Proceedings of the VLDB Endowment Computer Science Chai and Nayak (2018) Electronic Journal of Statistics Statistics & Probability Chandra and Sudarshan (2017) Proceedings of the VLDB Endowment Computer Science Chen et al. (2012) Communications of the Association for Information Systems Information Systems Cheng et al.…”
Section: Methodology Developmentmentioning
confidence: 99%
“…Health Cao et al. (2010) Proceedings of the VLDB Endowment Computer Science Chai and Nayak (2018) Electronic Journal of Statistics Statistics & Probability Chandra and Sudarshan (2017) Proceedings of the VLDB Endowment Computer Science Chen et al. (2012) Communications of the Association for Information Systems Information Systems Cheng et al.…”
Section: Methodology Developmentmentioning
confidence: 99%
“…This can be mitigated by prefetching asynchronously, and dynamically deciding to prefetch only after a certain number of accesses to minimize the overhead of prefetching. This is similar to the classical ski-rental problem [19] and has been applied earlier in the context of join optimizations in parallel data management systems [20]. Extending COBRA to adapt heuristics from [14] to efficiently handle alternatives generated due to caching is part of future work, and dynamic approaches for prefetching are part of future work.…”
Section: Transformationsmentioning
confidence: 98%
“…The approach was provided better performance, but however, it failed in enhancing query processing performance. To address this issue, a systematic process that was considered by load testing and profiling data was presented in [10,11] that utilized a software refactoring process to reduce the run time.…”
Section: Related Workmentioning
confidence: 99%