Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data 2004
DOI: 10.1145/1007568.1007603
|View full text |Cite
|
Sign up to set email alerts
|

Online maintenance of very large random samples

Abstract: Random sampling is one of the most fundamental data management tools available. However, most current research involving sampling considers the problem of how to use a sample, and not how to compute one. The implicit assumption is that a "sample" is a small data structure that is easily maintained as new data are encountered, even though simple statistical arguments demonstrate that very large samples of gigabytes or terabytes in size can be necessary to provide high accuracy. No existing work tackles the prob… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0
2

Year Published

2006
2006
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 37 publications
(23 citation statements)
references
References 30 publications
0
21
0
2
Order By: Relevance
“…The fastest streaming algorithm for maintaining a large fixed-size random sample on a magnetic disk is due to Jermaine et al [11]. The algorithm uses an abstraction called the Geometric File.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The fastest streaming algorithm for maintaining a large fixed-size random sample on a magnetic disk is due to Jermaine et al [11]. The algorithm uses an abstraction called the Geometric File.…”
Section: Related Workmentioning
confidence: 99%
“…In [11], the authors also propose using multiple geometric files in parallel for reducing the number of disk head movements. However, on flash, this scheme may not add a significant benefit since it does not reduce the amount of data overwrite and flash devices do not have any mechanical head movements.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations