2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) 2010
DOI: 10.1109/icde.2010.5447879
|View full text |Cite
|
Sign up to set email alerts
|

PIP: A database system for great and small expectations

Abstract: Estimation via sampling out of highly selective join queries is well known to be problematic, most notably in online aggregation. Without goal-directed sampling strategies, samples falling outside of the selection constraints lower estimation efficiency at best, and cause inaccurate estimates at worst. This problem appears in general probabilistic database systems, where query processing is tightly coupled with sampling. By committing to a set of samples before evaluating the query, the engine wastes effort on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0

Year Published

2011
2011
2022
2022

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 36 publications
(40 citation statements)
references
References 19 publications
0
40
0
Order By: Relevance
“…RELATED WORK There has been substantial work on uncertain data management lately (e.g., [10,28,24,16,17]). Cheng et al [7] proposed probabilistic threshold join, which is the same as our v-join semantics.…”
Section: Results For D-joinmentioning
confidence: 99%
“…RELATED WORK There has been substantial work on uncertain data management lately (e.g., [10,28,24,16,17]). Cheng et al [7] proposed probabilistic threshold join, which is the same as our v-join semantics.…”
Section: Results For D-joinmentioning
confidence: 99%
“…Green et al [24] studied probabilistic versions of C-tables. Virtual C-tables generalize C-tables [30,49] by allowing symbolic expressions as values.…”
Section: Related Workmentioning
confidence: 99%
“…Query evaluation over probabilistic databases corresponds to solving the weighted model counting problem, and current approaches can be classified into three categories ( Fig. 20): (1) incomplete approaches identify tractable cases either at the query-level [13,14,24,54] or the data-level [53,65,69] and ignore the rest; (2) exact approaches [2,43,68] are based on variants and extensions of a complete search based on the DPLL procedure [35] and work well for queries over databases with simple lineage expressions, but perform poorly on complex lineage expressions; and (3) approximate approaches usually first compute the lineage of the query on the given database to obtain a Boolean formula, then either apply variants of Monte Carlo sampling methods [42,45,46,63], or approximate the number of models of the Boolean lineage expression [23,55,64]. A recent approach combines safe plans with Monte Carlo simulation [38].…”
Section: Related Workmentioning
confidence: 99%