2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) 2010
DOI: 10.1109/icde.2010.5447802
|View full text |Cite
|
Sign up to set email alerts
|

Incorporating partitioning and parallel plans into the SCOPE optimizer

Abstract: Massive data analysis on large clusters presents new opportunities and challenges for query optimization. Data partitioning is crucial to performance in this environment. However, data repartitioning is a very expensive operation so minimizing the number of such operations can yield very significant performance improvements. A query optimizer for this environment must therefore be able to reason about data partitioning including its interaction with sorting and grouping.SCOPE is a SQL-like scripting language u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
55
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 76 publications
(55 citation statements)
references
References 13 publications
0
55
0
Order By: Relevance
“…It extends the work presented in [44] by introducing parallelization techniques for UDFs. An UDF is treated as a black-box operation.…”
Section: Parallelism In Traditional Data Flowmentioning
confidence: 89%
See 3 more Smart Citations
“…It extends the work presented in [44] by introducing parallelization techniques for UDFs. An UDF is treated as a black-box operation.…”
Section: Parallelism In Traditional Data Flowmentioning
confidence: 89%
“…The scheduling policies are proposed to optimize the ETL workflow with respect to execution time and memory consumption. In the literature, there exist multiple methods that revolve around data flow parallelism [34,36,40,[43][44][45]. However, research on an ETL workflow parallelism has not appealed much consideration.…”
Section: Etl Workflow Optimization: Summarymentioning
confidence: 99%
See 2 more Smart Citations
“…In a distributed environment, an additional dimension is introduced into the join taxonomy: the join graph topology. Graph topologies specify how different partitions of data are processed in a distributed way, and is affected by the following factors [25]:…”
Section: Join Processingmentioning
confidence: 99%