Approximating predicates and expressive queries on probabilistic databases

Koch, Christoph

doi:10.1145/1376916.1376932

Cited by 31 publications

(57 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It was first shown by Karp, Luby, and Madras [33] that there is a fully polynomial-time randomised approximation scheme (FPTRAS) for DNF counting based on Monte Carlo simulation. This algorithm can be modified to compute the probability of a DNF over independent discrete random variables [12,25,34,46]. These techniques yield an efficiently computable unbiased estimator that in expectation returns the probability p of a DNF of n clauses such that computing the average of a polynomial number of such Monte Carlo steps (which are calls to the Karp-Luby unbiased estimator) is an ( , δ)-approximation for the probability (i.e., a relative approximation): If the averagep is taken over at least 3 · n · log(2/δ)/ 2 Monte Carlo steps, then Pr | p −p| ≥ · p ≤ δ.…”

Section: Related Workmentioning

confidence: 99%

Anytime approximation in probabilistic databases

2013

View full text Add to dashboard Cite

This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.

show abstract

Section: Related Workmentioning

confidence: 99%

Anytime approximation in probabilistic databases

2013

View full text Add to dashboard Cite

show abstract

“…These systems model data with relations and therefore, they cannot perform shortest path computations on graphs efficiently. Also, since computing exact answers to many typical SQL queries has been shown to have #P-complete data complexity [13], research has focused on computing approximate answers [25,34].…”

Section: Related Workmentioning

confidence: 99%

k-nearest neighbors in uncertain graphs

et al. 2010

View full text Add to dashboard Cite

Complex networks, such as biological, social, and communication networks, often entail uncertainty, and thus, can be modeled as probabilistic graphs. Similar to the problem of similarity search in standard graphs, a fundamental problem for probabilistic graphs is to efficiently answer k-nearest neighbor queries (k-NN), which is the problem of computing the k closest nodes to some specific node.In this paper we introduce a framework for processing k-NN queries in probabilistic graphs. We propose novel distance functions that extend well-known graph concepts, such as shortest paths. In order to compute them in probabilistic graphs, we design algorithms based on sampling. During k-NN query processing we efficiently prune the search space using novel techniques.Our experiments indicate that our distance functions outperform previously used alternatives in identifying true neighbors in real-world biological data. We also demonstrate that our algorithms scale for graphs with tens of millions of edges.

show abstract

“…Koch [30] formalizes a language that allows predication on probabilities and discusses approximation algorithms for this richer language, though he does not consider HAVING aggregation. This is in part due to the fact that his aim is to create a fully compositional language for probabilistic databases [31].…”

Section: Related Workmentioning

confidence: 99%

The trichotomy of HAVING queries on a probabilistic database

Ré

Suciu

2009

The VLDB Journal

View full text Add to dashboard Cite

We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING in SQL) on probabilistic databases. More precisely, we study conjunctive queries with predicate aggregates on probabilistic databases where the aggregation function is one of MIN, MAX, EXISTS, COUNT, SUM, AVG, or COUNT(DISTINCT) and the comparison function is one of =, , ≥, >, ≤, or < . The complexity of evaluating a HAVING query depends on the aggregation function, α, and the comparison function, θ. In this paper, we establish a set of trichotomy results for conjunctive queries with HAVING predicates parametrized by (α, θ). For such queries (without self joins), one of the following three statements is true: (1) The exact evaluation problem has P-time data complexity. In this case, we call the query safe. (2) The exact evaluation problem is P-hard, but the approximate evaluation problem has (randomized) P-time data complexity. More precisely, there exists an  for the query. In this case, we call the query apx-safe. (3) The exact evaluation problem is P-hard, and the approximate evaluation problem is also hard. We call these queries hazardous. The precise definition of each class depends on the aggregate considered and the comparison function. Thus, we have queries that are (MAX, ≥)-safe, (COUNT, ≤)-apx-safe, (SUM, =)-hazardous, etc. Our trichotomy result is a signifi- cant extension of a previous dichotomy result for Boolean conjunctive queries into safe and not safe. For each of the three classes we present novel techniques. For safe queries, we describe an evaluation algorithm that uses random variables over semirings. For apx-safe queries, we describe an  that relies on a novel algorithm for generating a random possible world satisfying a given condition. Finally, for hazardous queries we give novel proofs of hardness of approximation. The results for safe queries were previously announced [43], but all other results are new.

show abstract

Approximating predicates and expressive queries on probabilistic databases

Cited by 31 publications

References 21 publications

Anytime approximation in probabilistic databases

Anytime approximation in probabilistic databases

k-nearest neighbors in uncertain graphs

The trichotomy of HAVING queries on a probabilistic database

Contact Info

Product

Resources

About