Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data 2014
DOI: 10.1145/2588555.2610531
|View full text |Cite
|
Sign up to set email alerts
|

Dynamically optimizing queries over large scale data platforms

Abstract: Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via userdefined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 29 publications
(26 citation statements)
references
References 36 publications
0
26
0
Order By: Relevance
“…This proposal can be seen as an elegant adaptation of [39], proposed in a parallel database system, to a cloud system. More generally, with respect to the issue of query optimization in cloud environments, the most recent and relevant proposals are described in [11,41,53,61].…”
Section: Discussionmentioning
confidence: 99%
“…This proposal can be seen as an elegant adaptation of [39], proposed in a parallel database system, to a cloud system. More generally, with respect to the issue of query optimization in cloud environments, the most recent and relevant proposals are described in [11,41,53,61].…”
Section: Discussionmentioning
confidence: 99%
“…Researches on this purpose fall into two main approaches [25]. A first approach, called Single Point-based Optimization [5,17,18,20] consists in monitoring a plan execution so as to detect estimation errors and a resulting sub-optimalty. This latter is corrected by interrupting the current execution and re-optimizing the remainder of the plan using up-to-date statistics.…”
Section: Preliminariesmentioning
confidence: 99%
“…A considerable body of literature was dedicated to find solutions to this problem. These solutions include mainly: (i) techniques for better quality of the statistical metadata [7,11,13,22,27,28], (ii) run-time techniques [5,[17][18][19][20] to monitor a query execution and trigger reoptimization of the plan when a sub-optimality is detected, and (iii) compile-time strategies [1-3, 9, 12] that permit the optimizer to generate an execution plan, being aware of the imprecision of used estimates.…”
Section: Introductionmentioning
confidence: 99%
“…Han et al [20] and Karanasos et al [23] both present their approaches to query optimization for distributed query execution by re-optimizing during execution using accurate statistic information about the data at the current stage of query execution. As we optimize query execution by hand, [23] show that collecting information such as the selectivity of predicates before query optimization only causes minor overhead.…”
Section: Related Workmentioning
confidence: 99%
“…As we optimize query execution by hand, [23] show that collecting information such as the selectivity of predicates before query optimization only causes minor overhead. This accurate information is required to determine which of the strategies we use in this paper is most efficient.…”
Section: Related Workmentioning
confidence: 99%