The monte carlo database system

Jampani, Ravi; Xu, Fei; Wu, Mingxi; Perez, Luis L.; Jermaine, Chris; Haas, Peter J.

doi:10.1145/2000824.2000828

Cited by 35 publications

(5 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unfortunately, generalizing from discrete to continuous distributions usually comes with substantial mathematical overhead. While several systems [2,24,35] handle continuous probability distributions, only recently [21,22], Grohe and Lindner proposed a general framework for rigorously dealing with probabilistic databases over continuous domains. Moreover, they establish basic properties such as the measurability of relational calculus and Datalog queries, which in turn allows for formally specifying the semantics of queries over continuous probabilistic databases.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generative Datalog with Continuous Distributions

Grohe

Kaminski

Katoen

et al. 2020

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

View full text Add to dashboard Cite

Arguing for the need to combine declarative and probabilistic programming, Bárány et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more foundational approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by Bárány et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database. CCS CONCEPTS • Mathematics of computing → Probabilistic representations; • Theory of computation → Constraint and logic programming; Database query languages (principles); Incomplete, inconsistent, and uncertain databases.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Finally, we mention MCDB [24] and its successor SimSQL [6]. Here, users are able to specify probabilistic models in the shape of random database instances.…”

Section: Introductionmentioning

confidence: 99%

Generative Datalog with Continuous Distributions

Grohe

Kaminski

Katoen

et al. 2020

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

View full text Add to dashboard Cite

show abstract

“…Cambronero et al [13] integrate probabilities into a relational database system to support imputation, while Hilprecht et al [39] use probabilistic circuits to improve query performance. Jampani et al [42] use probabilistic databases to support random data generation and simulation. Cai et al [12] provides Gibbs sampling support in the space of database tables to a SQL-like language, enabling bayesian machine learning workload such as linear regression or latent Dirichlet allocation.…”

Section: Related Workmentioning

confidence: 99%

“…For these models, this usually amounts to a program transformation [73], or e.g., a costly matrix inversion [49]. Likewise, for both exact and 179: 42 Huot, Ghavami, Lew, Schaechtle, Freer, Shelby, Rinard, Saad, Mansinghka random variables 𝑥 𝑛 converges to 𝑥, then for every continuous function 𝑓 , 𝑓 (𝑥 𝑛 ) converges to 𝑓 (𝑥).…”

Section: :30mentioning

confidence: 99%

GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables

Huot,

Ghavami,

Lew

et al. 2024

Proc. ACM Program. Lang.

View full text Add to dashboard Cite

This article presents GenSQL, a probabilistic programming system for querying probabilistic generative models of database tables. By augmenting SQL with only a few key primitives for querying probabilistic models, GenSQL enables complex Bayesian inference workflows to be concisely implemented. GenSQL’s query planner rests on a unified programmatic interface for interacting with probabilistic models of tabular data, which makes it possible to use models written in a variety of probabilistic programming languages that are tailored to specific workflows. Probabilistic models may be automatically learned via probabilistic program synthesis, hand-designed, or a combination of both. GenSQL is formalized using a novel type system and denotational semantics, which together enable us to establish proofs that precisely characterize its soundness guarantees. We evaluate our system on two case real-world studies—an anomaly detection in clinical trials and conditional synthetic data generation for a virtual wet lab—and show that GenSQL more accurately captures the complexity of the data as compared to common baselines. We also show that the declarative syntax in GenSQL is more concise and less error-prone as compared to several alternatives. Finally, GenSQL delivers a 1.7-6.8x speedup compared to its closest competitor on a representative benchmark set and runs in comparable time to hand-written code, in part due to its reusable optimizations and code specialization.

show abstract

“…In-database techniques are used for exact evaluation of tractable queries in tupleindependent probabilistic databases, e.g., using safe plans [6] as discussed below, and also for approximate evaluation of hard queries, e.g., computing lower and upper bounds on answer probabilities via dissociation of input probabilistic events [14] or running Monte Carlo simulations that aggregate the query answers over several possible worlds sampled from complex probabilistic models [20].…”

Section: In-database Techniquesmentioning

confidence: 99%

Query Processing over Uncertain Data

Dalvi¹,

Olteanu²

2018

Encyclopedia of Database Systems

View full text Add to dashboard Cite

An uncertain or probabilistic database is defined as a probability distribution over a set of deterministic database instances called possible worlds.In the classical deterministic setting, the query processing problem is to compute the set of tuples representing the answer of a given query on a given database. In the probabilistic setting, this problem becomes the computation of all pairs (t, p), where the tuple t is in the query answer in some random world of the input probabilistic database with probability p.

show abstract

The monte carlo database system

Cited by 35 publications

References 34 publications

Generative Datalog with Continuous Distributions

Generative Datalog with Continuous Distributions

GenSQL: A Probabilistic Programming System for Querying Generative Models of Database Tables

Query Processing over Uncertain Data

Contact Info

Product

Resources

About