2006
DOI: 10.1145/1141885.1141891
|View full text |Cite
|
Sign up to set email alerts
|

Sequential reservoir sampling with a nonuniform distribution

Abstract: We present a simple algorithm that allows sampling from a stream of data items without knowing the number of items in advance and without having to store all items in main memory. The sampling distribution may be general, that is, the probability of selecting a data item i may depend on the individual item. The main advantage of the algorithms is that they have to pass through the data items only once to produce a sample of arbitrary size n .We give different var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0

Year Published

2007
2007
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(21 citation statements)
references
References 3 publications
0
21
0
Order By: Relevance
“…Ideally, the sample buffers should keep a balance between sample diversity and adaptability. Motivated by this, reservoir sampling [18][19][20][21] is proposed for sequential random sampling. In principle, it aims to randomly draw some samples from a large population of samples that come in a sequential manner.…”
Section: Time-weighted Reservoir Samplingmentioning
confidence: 99%
See 2 more Smart Citations
“…Ideally, the sample buffers should keep a balance between sample diversity and adaptability. Motivated by this, reservoir sampling [18][19][20][21] is proposed for sequential random sampling. In principle, it aims to randomly draw some samples from a large population of samples that come in a sequential manner.…”
Section: Time-weighted Reservoir Samplingmentioning
confidence: 99%
“…Therefore, larger weights should be assigned to the recently added samples while smaller weights should be attached with the old samples. Inspired by [20,21], we design a time-weighted reservoir sampling (TWRS) method for randomly drawing the samples according to their time-varying properties, as listed in Algorithm 3. The designed TWRS method is capable of effectively maintaining the sample buffers for online metric learning in Sec.…”
Section: Time-weighted Reservoir Samplingmentioning
confidence: 99%
See 1 more Smart Citation
“…Although we can directly sampling the weighted random sample R from D, but because the needs of sample merge which will be discussed later, we exploit a weighted random sampling method on data stream (called WRS, meaning Weighted Reservoir Sampling, refer to [12]) to obtain R.…”
Section: Representation Of Data Nodesmentioning
confidence: 99%
“…Sampling is a very natural way to summarize data properties with sublinear space; indeed, it is a key component of many streaming algorithms and techniques. Just to mention a few, the relevant papers include Aggarwal [ [10]; BarYossef [13]; Bar-Yossef, Kumar and Sivakumar [17]; Buriol, Frahling, Leonardi, Marchetti-Spaccamela and Sohler [20]; Chakrabarti, Cormode and McGregor [21]; Chaudhuri and Mishra [26]; Chaudhuri, Motwani and Narasayya [27]; Cohen [29]; Cohen and Kaplan [30]; Cormode, Muthukrishnan and Rozenbaum [32]; Dasgupta, Drineas, Harb, Kumar and Mahoney [35]; Datar and Muthukrishnan [37]; Duffield, Lund and Thorup [38]; Frahling, Indyk and Sohler [43]; Gandhi, Suri and Welzl [46]; Gemulla [47]; Gemulla and Lehner [48]; Gibbons and Matias [49]; Guha, Meyerson, Mishra, Motwani and O'Callaghan [54]; Haas [55]; Kolonko and Wäsch [58]; Li [62]; Palmer and Faloutsos [67]; Szegedy [70]; and Vitter [72]; These papers illustrate the vitality of effective sampling methods for data streams. Among other methods, uniform random sampling is the most general and well-understood.…”
Section: Introductionmentioning
confidence: 99%