Hiren Patel scite author profile

Query optimizers are notorious for inaccurate cost estimates, leading to poor performance. The root of the problem lies in inaccurate cardinality estimates, i.e., the size of intermediate (and final) results in a query plan. These estimates also determine the resources consumed in modern shared cloud infrastructures. In this paper, we present CARDLEARNER, a machine learning based approach to learn cardinality models from previous job executions and use them to predict the cardinalities in future jobs. The key intuition in our approach is that shared cloud workloads are often recurring and overlapping in nature, and so we could learn cardinality models for overlapping subgraph templates. We discuss various learning approaches and show how learning a large number of smaller models results in high accuracy and explainability. We further present an exploration technique to avoid learning bias by considering alternate join orders and learning cardinality models over them. We describe the feedback loop to apply the learned models back to future job executions. Finally, we show a detailed evaluation of our models (up to 5 orders of magnitude less error), query plans (60% applicability), performance (up to 100% faster, 3× fewer resources), and exploration (optimal in few 10s of executions).

show abstract

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

Siddiqui

Jindal

Qiao

et al. 2020

View full text Add to dashboard Cite

Query processing over big data is ubiquitous in modern clouds, where the system takes care of picking both the physical query execution plans and the resources needed to run those plans, using a cost-based query optimizer. A good cost model, therefore, is akin to better resource efficiency and lower operational costs. Unfortunately, the production workloads at Microsoft show that costs are very complex to model for big data systems. In this work, we investigate two key questions: (i) can we learn accurate cost models for big data systems, and (ii) can we integrate the learned models within the query optimizer. To answer these, we make three core contributions. First, we exploit workload patterns to learn a large number of individual cost models and combine them to achieve high accuracy and coverage over a long period. Second, we propose extensions to Cascades framework to pick optimal resources, i.e, number of containers, during query planning. And third, we integrate the learned cost models within the Cascade-style query optimizer of SCOPE at Microsoft. We evaluate the resulting system, Cleo, in a production environment using both production and TPC-H workloads. Our results show that the learned cost models are 2 to 3 orders of magnitude more accurate, and 20× more correlated with the actual runtimes, with a large majority (70%) of the plan changes leading to substantial improvements in latency as well as resource usage.

show abstract

Effect of Ketoprofen Co-administration and Febrile State on Pharmacokinetic of Cefepime in Goats

Patel¹,

Patel²,

Patel³

et al. 2011

Asian J. of Animal and Veterinary Advances

View full text Add to dashboard Cite

Blockchain state-of-the-art: architecture, use cases, consensus, challenges and opportunities

Shrimali

Patel

2022

Journal of King Saud University - Computer and Information Scie

View full text Add to dashboard Cite

Quality of casein based Mozzarella cheese analogue as affected by stabilizer blends

et al. 2010

View full text Add to dashboard Cite

Suitability of xanthan gum (XG)-locust bean gum (LBG), carrageenan (CAR)-LBG, and XG-CAR in 1:1 proportion at 0.42% in the formulation was assessed in the manufacture of Mozzarella cheese analogue. The stabilizer blends did not signifi cantly infl uence the composition, texture profi le, organoleptic, baking qualities and pizza-related characteristics of cheese analogues. Considering the infl uence of stabilizer blend on the sensory quality of analogue and sensory rating of pizza pie, XG-LBG blend (1:1) was preferred over XG-CAR and CAR-LBG.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hiren Patel

Towards a learning optimizer for shared clouds

Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings

Effect of Ketoprofen Co-administration and Febrile State on Pharmacokinetic of Cefepime in Goats

Blockchain state-of-the-art: architecture, use cases, consensus, challenges and opportunities

Quality of casein based Mozzarella cheese analogue as affected by stabilizer blends

Contact Info

Product

Resources

About