Continuous cloud-scale query optimization and processing

Bruno, Nicolas; Jain, Sapna; Zhou, Jingren

doi:10.14778/2536222.2536223

Cited by 45 publications

(35 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such design time optimization approaches require higher overhear and are hence mostly applicable to the traditional BI settings (i.e., ETL). Some of the recent approaches insist on the importance of having accurate statistics for creating an optimal execution of a data flow, both for design time [41] and runtime scenarios [9]. Still, the challenges for efficiently gathering and exploiting such statistics metadata for optimizing data-intensive flows remain due to the required close to zero overhead of an optimization process and the "right-time" data delivery demands in the next generation BI settings (i.e., ETO).…”

Section: Discussionmentioning

confidence: 99%

“…They show how to deal both with the conventional (relational algebra) operators as well as with complex data transformations typical for the next generation data flows (e.g., sentiment or text analysis). The importance of using more accurate statistics for optimizing data flows in dynamic, cloud-scale environments has been also discussed in [9]. To deal with uncertainty when optimizing running data flows they propose an approach that continuously monitors the execution of data flows at runtime, gathers statistics, and re-optimizes data flows on-the-fly to achieve better performance.…”

Section: Dynamicitymentioning

confidence: 99%

See 1 more Smart Citation

A Unified View of Data-Intensive Flows in Business Intelligence Systems: A Survey

Jovanovic

Romero

Abelló

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Data-intensive flows are central processes in today's business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysisready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of dataintensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today's research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Dynamicitymentioning

confidence: 99%

A Unified View of Data-Intensive Flows in Business Intelligence Systems: A Survey

Jovanovic

Romero

Abelló

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Bruno N et al [4] has proposed a technique that continuously monitors query execution, collect actual runtime statistics and adapts execution plans as the query executes. The query optimizer is triggered whenever new runtime statistics become available.…”

Section: Continuous or Iterative Processingmentioning

confidence: 99%

Query Optimization for Databases in Cloud Environment: A Survey

Bachhav¹,

Kharat²,

Shelar³

2017

IJDTA

View full text Add to dashboard Cite

show abstract

“…Researches on this purpose fall into two main approaches [25]. A first approach, called Single Point-based Optimization [5,17,18,20] consists in monitoring a plan execution so as to detect estimation errors and a resulting sub-optimalty. This latter is corrected by interrupting the current execution and re-optimizing the remainder of the plan using up-to-date statistics.…”

Section: Preliminariesmentioning

confidence: 99%

“…A considerable body of literature was dedicated to find solutions to this problem. These solutions include mainly: (i) techniques for better quality of the statistical metadata [7,11,13,22,27,28], (ii) run-time techniques [5,[17][18][19][20] to monitor a query execution and trigger reoptimization of the plan when a sub-optimality is detected, and (iii) compile-time strategies [1-3, 9, 12] that permit the optimizer to generate an execution plan, being aware of the imprecision of used estimates.…”

Section: Introductionmentioning

confidence: 99%

Handling Estimation Inaccuracy in Query Optimization

Moumen

Morvan

Hameurlain

2016

Web Technologies and Applications

View full text Add to dashboard Cite

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. Abstract. Cost-based Optimizers choose query execution plans using a cost model. The latter relies on the accuracy of estimated statistics. Unfortunately, compile-time estimates often differ significantly from runtime values, leading to a suboptimal plan choices. In this paper, we propose a compile-time strategy, wherein the optimization process is fully aware of the estimation inaccuracy. This is ensured by the use of intervals of estimates rather than single-point estimates of error-prone parameters. These intervals serve to identify plans that provide stable performance in several run-time conditions, so called robust. Our strategy relies on a probabilistic approach to decide which plan to choose to start the execution. Our experiments show that our proposal allows a considerable improvement of the ability of a query optimizer to produce a robust execution plan in case of large estimation errors.

show abstract

Continuous cloud-scale query optimization and processing

Cited by 45 publications

References 24 publications

A Unified View of Data-Intensive Flows in Business Intelligence Systems: A Survey

A Unified View of Data-Intensive Flows in Business Intelligence Systems: A Survey

Query Optimization for Databases in Cloud Environment: A Survey

Handling Estimation Inaccuracy in Query Optimization

Contact Info

Product

Resources

About