Proceedings of the 2021 International Conference on Management of Data 2021
DOI: 10.1145/3448016.3457546
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload

Abstract: The use of deep learning models for forecasting the resource consumption patterns of SQL queries have recently been a popular area of study. While these models have demonstrated promising accuracy, training them over large scale industry workloads are expensive. Space inefficiencies of encoding techniques over large numbers of queries and excessive padding used to enforce shape consistency across diverse query plans implies 1) longer model training time and 2) the need for expensive, scaled up infrastructure t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…Recently, it is becoming a common practice to build largescale commercial data lakes on cloud infrastructure [144], [148], [152]. Cloud-based storage choices include singlecloud, multi-cloud and a hybrid of cloud and on-premise platforms [148].…”
Section: Cloud Data Lakesmentioning
confidence: 99%
“…Recently, it is becoming a common practice to build largescale commercial data lakes on cloud infrastructure [144], [148], [152]. Cloud-based storage choices include singlecloud, multi-cloud and a hybrid of cloud and on-premise platforms [148].…”
Section: Cloud Data Lakesmentioning
confidence: 99%
“…Others focus on refining traditional cost models and plan enumeration algorithms. For learned cost models, [41] and [55,57] utilize TreeL-STM and convolution models to learn cost of single and concurrent queries, respectively. Plan enumeration is often modelled as a reinforcement learning problem on deciding the best join order of tables.…”
Section: Related Workmentioning
confidence: 99%
“…Cardinality/selectivity estimation, has improved considerably leveraging ML [17,70,77,78,84]. Likewise for query optimization [27,44,45], indexes [9,10,30,49], cost estimation [63,83], workload forecasting [85], DB tuning [34,68,81], synthetic data generation [7,54,76], etc.…”
Section: Introductionmentioning
confidence: 99%