2020
DOI: 10.48550/arxiv.2003.06613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning

Abstract: As more and more organizations rely on data-driven decision making, large-scale analytics become increasingly important. However, an analyst is often stuck waiting for an exact result. As such, organizations turn to Cloud providers that have infrastructure for efficiently analyzing large quantities of data. But, with increasing costs, organizations have to optimize their usage. Having a cheap alternative that provides speed and efficiency will go a long way. Concretely, we offer a solution that can provide app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…QuickSel [23] is also an earlier method of using neural network to solve the cardinality estimation problem by fitting the data distribution. ML-AQP [24] leverages the query workload-driven idea to define the AQP problem as a supervised learning task. It uses a regression model to mine the relationship between the mapping of queries to aggregate function values in the logs.…”
Section: Related Workmentioning
confidence: 99%
“…QuickSel [23] is also an earlier method of using neural network to solve the cardinality estimation problem by fitting the data distribution. ML-AQP [24] leverages the query workload-driven idea to define the AQP problem as a supervised learning task. It uses a regression model to mine the relationship between the mapping of queries to aggregate function values in the logs.…”
Section: Related Workmentioning
confidence: 99%
“…However, majority of AQP systems use stratified sampling based on prior knowledge (which might not be always available) of the data distributions [17], [18], [19]. Specifically, it had been demonstrated that uniform random samples are less effective for answering "Group By" which are important when conducting data exploratory analysis while biased sampling show better efficiency for these sort of tasks [20].…”
Section: Related Workmentioning
confidence: 99%
“…As much as this method is similar to our proposed method, one significant difference rely in the way the DL model is being used: while in this method the model is used to generate samples distributed tightly similar to the dataset distributions and then execute the queries on these samples, our method rely on the intrinsic structure of the LSTM network to both learn the dataset distributions and answer the approximated result. similar to our approach, this work utilized ML models to approximate aggregated SQL queries [19]. Specifically, gradient Boosting Machines (GBM), XGBoost and LightGBM were trained to predict the aggregated queries' result.…”
Section: Related Workmentioning
confidence: 99%
“…For instance, the DBEst Query processing engine [6] trains models, notably regression models and density estimators, that provide accurate, efficient, and cost-effective responses to different types of aggregate queries. Learning-based AQP (LAQP) [7] and ML-AQP [8] methods build machine learning models based on historically executed queries. The former builds an error model to predict each incoming query's sampling-based estimation error, whereas the latter trains models that learn patterns to predict future query results with a bound error by applying prediction intervals constructed using Quantile Regression models.…”
Section: Introductionmentioning
confidence: 99%