PIP: A database system for great and small expectations

Kennedy, Oliver

doi:10.1109/icde.2010.5447879

Cited by 36 publications

(40 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RELATED WORK There has been substantial work on uncertain data management lately (e.g., [10,28,24,16,17]). Cheng et al [7] proposed probabilistic threshold join, which is the same as our v-join semantics.…”

Section: Results For D-joinmentioning

confidence: 99%

Join queries on uncertain data: Semantics and efficient processing

2011

2011 IEEE 27th International Conference on Data Engineering

View full text Add to dashboard Cite

Uncertain data is quite common nowadays in a variety of modern database applications. At the same time, the join operation is one of the most important but expensive operations in SQL. However, join queries on uncertain data have not been adequately addressed thus far. In this paper, we study the SQL join operation on uncertain attributes.We observe and formalize two kinds of join operations on such data, namely vjoin and d-join. They are each useful for different applications. Using probability theory, we then devise efficient query processing algorithms for these join operations. Specifically, we use probability bounds that are based on the moments of random variables to either early accept or early reject a candidate v-join result tuple. We also devise an indexing mechanism and an algorithm called Two-End Zigzag Join to further save I/O costs. For d-join, we first observe that it can be reduced to a special form of similarity join in a multidimensional space. We then design an efficient algorithm called condensed d-join and an optimal condensation scheme based on dynamic programming. Finally, we perform a comprehensive empirical study using both real datasets and synthetic datasets.

show abstract

Section: Results For D-joinmentioning

confidence: 99%

Join queries on uncertain data: Semantics and efficient processing

2011

2011 IEEE 27th International Conference on Data Engineering

View full text Add to dashboard Cite

show abstract

“…Green et al [24] studied probabilistic versions of C-tables. Virtual C-tables generalize C-tables [30,49] by allowing symbolic expressions as values.…”

Section: Related Workmentioning

confidence: 99%

Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers

Huber

Glavic

et al. 2019

Proceedings of the 2019 International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

Certain answers are a principled method for coping with uncertainty that arises in many practical data management tasks. Unfortunately, this method is expensive and may exclude useful (if uncertain) answers. Thus, users frequently resort to less principled approaches to resolve the uncertainty. In this paper, we propose Uncertainty Annotated Databases (UA-DBs), which combine an under-and over-approximation of certain answers to achieve the reliability of certain answers, with the performance of a classical database system. Furthermore, in contrast to prior work on certain answers, UA-DBs achieve a higher utility by including some (explicitly marked) answers that are not certain. UA-DBs are based on incomplete K-relations, which we introduce to generalize the classical set-based notions of incomplete databases and certain answers to a much larger class of data models. Using an implementation of our approach, we demonstrate experimentally that it efficiently produces tight approximations of certain answers that are of high utility.

show abstract

“…Query evaluation over probabilistic databases corresponds to solving the weighted model counting problem, and current approaches can be classified into three categories ( Fig. 20): (1) incomplete approaches identify tractable cases either at the query-level [13,14,24,54] or the data-level [53,65,69] and ignore the rest; (2) exact approaches [2,43,68] are based on variants and extensions of a complete search based on the DPLL procedure [35] and work well for queries over databases with simple lineage expressions, but perform poorly on complex lineage expressions; and (3) approximate approaches usually first compute the lineage of the query on the given database to obtain a Boolean formula, then either apply variants of Monte Carlo sampling methods [42,45,46,63], or approximate the number of models of the Boolean lineage expression [23,55,64]. A recent approach combines safe plans with Monte Carlo simulation [38].…”

Section: Related Workmentioning

confidence: 99%

Dissociation and propagation for approximate lifted inference with standard relational database management systems

Gatterbauer

Suciu

2016

The VLDB Journal

View full text Add to dashboard Cite

Probabilistic inference over large data sets is a challenging data management problem since exact inference is generally #P-hard and is most often solved approximately with sampling-based methods today. This paper proposes an alternative approach for approximate evaluation of conjunctive queries with standard relational databases: In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known PTIME self-join-free conjunctive queries: A query is in PTIME if and only if our algorithm returns one single plan. Furthermore, our approach is a generalization of a family of efficient ranking methods from graphs to hypergraphs. We also adapt three relational query optimization techniques to evaluate all necessary plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers. We also note that the techniques developed in this paper apply immediately to lifted inference from statistical relational models since lifted inference corresponds to PTIME plans in probabilistic databases.

show abstract

PIP: A database system for great and small expectations

Cited by 36 publications

References 19 publications

Join queries on uncertain data: Semantics and efficient processing

Join queries on uncertain data: Semantics and efficient processing

Uncertainty Annotated Databases - A Lightweight Approach for Approximating Certain Answers

Dissociation and propagation for approximate lifted inference with standard relational database management systems

Contact Info

Product

Resources

About