Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data 1999
DOI: 10.1145/304182.304206
|View full text |Cite
|
Sign up to set email alerts
|

On random sampling over joins

Abstract: A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insigh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
173
0
2

Year Published

2001
2001
2018
2018

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 193 publications
(176 citation statements)
references
References 11 publications
1
173
0
2
Order By: Relevance
“…The other method uses the packets arrival times (which can be used to anticipate a traffic burst) together with knowledge about the processing time required to process a sample. [23] proposes to use the least squares estimate and a certain set of heuristic rules to determine the sampling rate. The authors in [24] describe a flow sampling approach, which allows controlling the expected volume of samples and minimizes the variance of the estimates.…”
Section: Adaptive Samplingmentioning
confidence: 99%
“…The other method uses the packets arrival times (which can be used to anticipate a traffic burst) together with knowledge about the processing time required to process a sample. [23] proposes to use the least squares estimate and a certain set of heuristic rules to determine the sampling rate. The authors in [24] describe a flow sampling approach, which allows controlling the expected volume of samples and minimizes the variance of the estimates.…”
Section: Adaptive Samplingmentioning
confidence: 99%
“…For join queries that access attributes from multiple datasets R 1 , ..., R l it is conceivable to construct a result approximation or result size estimation from multiple synopses. On the other hand, it is known that this approach may lead to unbounded approximation errors [5]. Therefore, we have adopted the approach of [1] to use special join synopses for this purpose.…”
Section: Frameworkmentioning
confidence: 99%
“…As pointed out in [5] (in the context of sampling), it is usually not feasible to estimate arbitrary join queries from approximations of the joining base relations with acceptable accuracy. For sampling, this phenomenon is discussed extensively in [5], but it does also hold for all other data reduction techniques that estimate join queries from approximations of the base relations.…”
Section: Join Synopsesmentioning
confidence: 99%
See 2 more Smart Citations