2015
DOI: 10.1007/978-3-319-24024-4_12
|View full text |Cite
|
Sign up to set email alerts
|

Weighted Random Sampling over Data Streams

Abstract: Abstract. In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([3,8]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 32 publications
(23 citation statements)
references
References 15 publications
0
23
0
Order By: Relevance
“…In order to translate our algorithm into a single-pass algorithm with a space bound even independent of n (though exponential in d), note that by Lemma 10 in both of the cases in which our algorithm operates, we only need a constant size sample of the elements in order to get a good approximation. In the first case we need to sample s = Θ( 1 ε 2 log 1 εδ ) of the locations qij = ⊥ proportional to their probabilities pij with repetition which can be done by running s independent copies of the weighted sampling algorithm by Chao [7] which is a straightforward generalization of the well-known reservoir sampling approach [36] to the weighted case; see also [10]. At the same time we also sample everything we need for the second case.…”
Section: Extensions To the Streaming Settingmentioning
confidence: 98%
“…In order to translate our algorithm into a single-pass algorithm with a space bound even independent of n (though exponential in d), note that by Lemma 10 in both of the cases in which our algorithm operates, we only need a constant size sample of the elements in order to get a good approximation. In the first case we need to sample s = Θ( 1 ε 2 log 1 εδ ) of the locations qij = ⊥ proportional to their probabilities pij with repetition which can be done by running s independent copies of the weighted sampling algorithm by Chao [7] which is a straightforward generalization of the well-known reservoir sampling approach [36] to the weighted case; see also [10]. At the same time we also sample everything we need for the second case.…”
Section: Extensions To the Streaming Settingmentioning
confidence: 98%
“…The second type is in-class negative samples, which are the negative samples that are in the same category as p i but is less relevant to p i than p + i . Since we are more interested in the top-ranked images, we draw inclass negative samples p − i with the same distribution as (7). In order to ensure robust ordering between p + i and p − i in a triplet t i = (p i , p + i , p − i ), we also require that the margin between the relevance score r i,i + and r i,i − should be larger than T r , i.e.,…”
Section: Triplet Samplingmentioning
confidence: 99%
“…Weighted random sampling was studied in [7] (see also references within). While the one of the sampling forms used here fits our framework, the underlying algorithms differ from ours, and in particular use much more invocations of the randomness function than our technique (see discussion in Section III).…”
Section: Related Workmentioning
confidence: 99%