Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data 1990
DOI: 10.1145/93597.93611
|View full text |Cite
|
Sign up to set email alerts
|

Practical selectivity estimation through adaptive sampling

Abstract: Recently we have proposed an adaptive, random sampling algorithm for general query size estlmatlon In earlier work we analyzed the asymptotic ef'l?clency and accuracy of the algorithm, m this paper we mvestlgate Its practlcahty as applied to selects and Jams First, we extend our previous analysis to provide agmficantly improved bounds on the amount of samplmg necessary for a given level of accuracy Next, we provide "sanity bounds" to deal with queries for which the underlying data 1s extremely skewed or the qu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0
2

Year Published

1996
1996
2016
2016

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 210 publications
(92 citation statements)
references
References 13 publications
0
90
0
2
Order By: Relevance
“…In practice, arbitrary data distributions are usually described using histograms, the idea of which is to divide the feature universe into a set of partitions such that the feature distribution of the objects within each partition is (almost) uniform [PiatetskyShapiro and Connell 1984;Liption et al 1990]. Figure 9 illustrates an example where four partitions A, B, C, D are allocated.…”
Section: General Temporal Datasetsmentioning
confidence: 99%
“…In practice, arbitrary data distributions are usually described using histograms, the idea of which is to divide the feature universe into a set of partitions such that the feature distribution of the objects within each partition is (almost) uniform [PiatetskyShapiro and Connell 1984;Liption et al 1990]. Figure 9 illustrates an example where four partitions A, B, C, D are allocated.…”
Section: General Temporal Datasetsmentioning
confidence: 99%
“…Accordingly, we need some inexpensive technique to estimate f c and f r . We consider two main broadly used estimation approaches: estimation based on some default values and system statistics ( [25], [23]), and estimation based on sampling techniques ( [26], [27], [28]). Unlike the case with relational database systems, in string matching problems statistics about the collections of strings and how they interact with filters may not be available.…”
Section: B Cost Estimationmentioning
confidence: 99%
“…The focus of this work is not on ad-hoc queries but to address scenarios such as query optimizer testing where it is necessary to obtain the cardinality-optimal plan. Finally, while sampling techniques has been used to obtain selectivity estimates during query optimization (e.g., [16]), it is not applicable for our problem since we are interested in obtaining exact cardinalities.…”
Section: Related Workmentioning
confidence: 99%