Nikhil Sheoran scite author profile

The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed Machine-Learning-based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates (WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at run-time generates a small (≈ 1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.

show abstract

A Step Toward Deep Online Aggregation

Sheoran

Chockchowwat

Arav

et al. 2023

Proc. ACM Manag. Data

View full text Add to dashboard Cite

For exploratory data analysis, it is often desirable to know what answers you are likely to get before actually obtaining those answers. This can potentially be achieved by designing systems to offer the estimates of a data operation result-say op(data)-earlier in the process based on partial data processing. Those estimates continuously refine as more data is processed and finally converge to the exact answer. Unfortunately, the existing techniques-called Online Aggregation (OLA)-are limited to a single operation; that is, we cannot obtain the estimates for op(op(data)) or op(...(op(data))). If this Deep OLA becomes possible, data analysts will be able to explore data more interactively using complex cascade operations. In this work, we take a step toward Deep OLA with evolving data frames (edf), a novel data model to offer OLA for nested ops-op(...(op(data)))-by representing an evolving structured data (with converging estimates) that is closed under set operations. That is, op(edf) produces yet another edf; thus, we can freely apply successive operations to edf and obtain an OLA output for each op. We evaluate its viability with Wake, an edf-based OLA system, by examining against state-of-the-art OLA and non-OLA systems. In our experiments on TPC-H dataset, Wake produces its first estimates 4.93× faster (median)-with 1.3× median slowdown for exact answers-compared to conventional systems. Besides its generality, Wake is also 1.92× faster (median) than existing OLA systems in producing estimates of under 1% relative errors.

show abstract

Multi-touch Attribution for Complex B2B Customer Journeys using Temporal Convolutional Networks

Agrawal¹,

Sheoran

Suman³

et al. 2022

View full text Add to dashboard Cite

DeepOLA: Online Aggregation for Deeply Nested Queries

Sheoran

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nikhil Sheoran

Efficient Insights Discovery through Conditional Generative Model based Query Approximation

Conditional Generative Model Based Predicate-Aware Query Approximation

A Step Toward Deep Online Aggregation

Multi-touch Attribution for Complex B2B Customer Journeys using Temporal Convolutional Networks

DeepOLA: Online Aggregation for Deeply Nested Queries

Contact Info

Product

Resources

About