2008 IEEE 24th International Conference on Data Engineering 2008
DOI: 10.1109/icde.2008.4497428
|View full text |Cite
|
Sign up to set email alerts
|

Robust Stratified Sampling Plans for Low Selectivity Queries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2010
2010
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(12 citation statements)
references
References 9 publications
0
12
0
Order By: Relevance
“…The Density-Based Distribute Data Stream Clustering (DB-DDSC) determines clusters with different structures under the Big data stream environment [17]. Recruiting similar services in the same clusters provides collaborative recommendation services to the system [18]. Notably, regular research considers the frequency of instances as the primary metric in sampling [19 -21].…”
Section: Related Workmentioning
confidence: 99%
“…The Density-Based Distribute Data Stream Clustering (DB-DDSC) determines clusters with different structures under the Big data stream environment [17]. Recruiting similar services in the same clusters provides collaborative recommendation services to the system [18]. Notably, regular research considers the frequency of instances as the primary metric in sampling [19 -21].…”
Section: Related Workmentioning
confidence: 99%
“…Because the data records are hidden under limited query interfaces in these systems, sampling involves very distinct challenges. Sampling for Aggregation Queries: Sampling algorithms have also been studied in the context of aggregation queries on large data bases [18], [1], [19], [25]. Approximate Pre-Aggregation (APA) [18] was proposed to estimate aggregation queries over categorical data utilizing precomputed statistics about the dataset.…”
Section: Related Workmentioning
confidence: 99%
“…Sampling for Aggregation Queries: Sampling algorithms have also been studied in the context of aggregation queries on large data bases [18], [1], [19], [25]. Approximate Pre-Aggregation (APA) [18] was proposed to estimate aggregation queries over categorical data utilizing precomputed statistics about the dataset. Wu et al [25] proposed a Bayesian method for guessing the extreme values in a dataset based on the learned query shape pattern and characteristics from previous workloads.…”
Section: Related Workmentioning
confidence: 99%
“…These include sequential sampling analysis [Haas and Swami, 1992;Hou et al, 1991], keeping additional statistics to improve the estimation [Haas and Swami, 1995], labelling the tuples and using label-dependent estimation procedures [Ganguly et al, 1996], or applying the cumulative distribution function inversion procedure [Wu et al, 2001]. Some work also looked at nonuniform sampling [Babcock et al, 2003;Estan and Naughton, 2006] and stratified sampling [Chaudhuri et al, 2007;Joshi and Jermaine, 2008]. Despite all these relevant contributions, online sampling is still considered too expensive for most applications.…”
Section: Related Workmentioning
confidence: 99%