2013
DOI: 10.14778/2536222.2536223
|View full text |Cite
|
Sign up to set email alerts
|

Continuous cloud-scale query optimization and processing

Abstract: Massive data analysis in cloud-scale data centers plays a crucial role in making critical business decisions. Highlevel scripting languages free developers from understanding various system trade-offs, but introduce new challenges for query optimization. One key optimization challenge is missing accurate data statistics, typically due to massive data volumes and their distributed nature, complex computation logic, and frequent usage of user-defined functions. In this paper we propose novel techniques to adapt … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
34
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 45 publications
(35 citation statements)
references
References 24 publications
1
34
0
Order By: Relevance
“…Such design time optimization approaches require higher overhear and are hence mostly applicable to the traditional BI settings (i.e., ETL). Some of the recent approaches insist on the importance of having accurate statistics for creating an optimal execution of a data flow, both for design time [41] and runtime scenarios [9]. Still, the challenges for efficiently gathering and exploiting such statistics metadata for optimizing data-intensive flows remain due to the required close to zero overhead of an optimization process and the "right-time" data delivery demands in the next generation BI settings (i.e., ETO).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Such design time optimization approaches require higher overhear and are hence mostly applicable to the traditional BI settings (i.e., ETL). Some of the recent approaches insist on the importance of having accurate statistics for creating an optimal execution of a data flow, both for design time [41] and runtime scenarios [9]. Still, the challenges for efficiently gathering and exploiting such statistics metadata for optimizing data-intensive flows remain due to the required close to zero overhead of an optimization process and the "right-time" data delivery demands in the next generation BI settings (i.e., ETO).…”
Section: Discussionmentioning
confidence: 99%
“…They show how to deal both with the conventional (relational algebra) operators as well as with complex data transformations typical for the next generation data flows (e.g., sentiment or text analysis). The importance of using more accurate statistics for optimizing data flows in dynamic, cloud-scale environments has been also discussed in [9]. To deal with uncertainty when optimizing running data flows they propose an approach that continuously monitors the execution of data flows at runtime, gathers statistics, and re-optimizes data flows on-the-fly to achieve better performance.…”
Section: Dynamicitymentioning
confidence: 99%
“…Bruno N et al [4] has proposed a technique that continuously monitors query execution, collect actual runtime statistics and adapts execution plans as the query executes. The query optimizer is triggered whenever new runtime statistics become available.…”
Section: Continuous or Iterative Processingmentioning
confidence: 99%
“…Researches on this purpose fall into two main approaches [25]. A first approach, called Single Point-based Optimization [5,17,18,20] consists in monitoring a plan execution so as to detect estimation errors and a resulting sub-optimalty. This latter is corrected by interrupting the current execution and re-optimizing the remainder of the plan using up-to-date statistics.…”
Section: Preliminariesmentioning
confidence: 99%
“…A considerable body of literature was dedicated to find solutions to this problem. These solutions include mainly: (i) techniques for better quality of the statistical metadata [7,11,13,22,27,28], (ii) run-time techniques [5,[17][18][19][20] to monitor a query execution and trigger reoptimization of the plan when a sub-optimality is detected, and (iii) compile-time strategies [1-3, 9, 12] that permit the optimizer to generate an execution plan, being aware of the imprecision of used estimates.…”
Section: Introductionmentioning
confidence: 99%