Weighted Random Sampling over Data Streams

Efraimidis, Pavlos S.

doi:10.1007/978-3-319-24024-4_12

Cited by 32 publications

(23 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to translate our algorithm into a single-pass algorithm with a space bound even independent of n (though exponential in d), note that by Lemma 10 in both of the cases in which our algorithm operates, we only need a constant size sample of the elements in order to get a good approximation. In the first case we need to sample s = Θ( 1 ε 2 log 1 εδ ) of the locations qij = ⊥ proportional to their probabilities pij with repetition which can be done by running s independent copies of the weighted sampling algorithm by Chao [7] which is a straightforward generalization of the well-known reservoir sampling approach [36] to the weighted case; see also [10]. At the same time we also sample everything we need for the second case.…”

Section: Extensions To the Streaming Settingmentioning

confidence: 98%

Smallest enclosing ball for probabilistic data

Munteanu

Sohler

Feldman

2014

Proceedings of the Thirtieth Annual Symposium on Computational Geometry

View full text Add to dashboard Cite

This paper deals with computing the smallest enclosing ball of a set of points subject to probabilistic data. In our setting, any of the n points may not or may occur at one of finitely many locations, following its own discrete probability distribution. The objective is therefore considered to be a random variable and we aim at finding a center minimizing the expected maximum distance to the points according to their distributions. Our main contribution presented in this paper is the first polynomial time (1 + ε)-approximation algorithm for the probabilistic smallest enclosing ball problem with extensions to the streaming setting.

show abstract

Section: Extensions To the Streaming Settingmentioning

confidence: 98%

Smallest enclosing ball for probabilistic data

Munteanu

Sohler

Feldman

2014

Proceedings of the Thirtieth Annual Symposium on Computational Geometry

View full text Add to dashboard Cite

show abstract

“…The second type is in-class negative samples, which are the negative samples that are in the same category as p i but is less relevant to p i than p + i . Since we are more interested in the top-ranked images, we draw inclass negative samples p − i with the same distribution as (7). In order to ensure robust ordering between p + i and p − i in a triplet t i = (p i , p + i , p − i ), we also require that the margin between the relevance score r i,i + and r i,i − should be larger than T r , i.e.,…”

Section: Triplet Samplingmentioning

confidence: 99%

Learning Fine-Grained Image Similarity with Deep Ranking

Wang¹,

Song

Leung

et al. 2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

1,193

880

View full text Add to dashboard Cite

show abstract

“…Weighted random sampling was studied in [7] (see also references within). While the one of the sampling forms used here fits our framework, the underlying algorithms differ from ours, and in particular use much more invocations of the randomness function than our technique (see discussion in Section III).…”

Section: Related Workmentioning

confidence: 99%

The case for sampling on very large file systems

Goldberg

Harnik

Sotnikov

2014

2014 30th Symposium on Mass Storage Systems and Technologies (MSST)

View full text Add to dashboard Cite

Sampling has long been a prominent tool in statistics and analytics, first and foremost when very large amounts of data are involved. In the realm of very large file systems (and hierarchical data stores in general), however, sampling has mostly been ignored and for several good reasons. Mainly, running sampling in such an environment introduces technical challenges that make the entire sampling process non-beneficial. In this work we demonstrate that there are cases for which sampling is very worthwhile in very large file systems. We address this topic in two aspect: (a) the technical side where we design and implement solutions to efficient weighted sampling that is also distributed, one-pass and addresses multiple efficiency aspects; and (b) the usability aspect in which we demonstrate several use-cases in which weighted sampling over large file systems is extremely beneficial. In particular, we show use-cases regarding estimation of compression ratios, testing and auditing and offline collection of statistics on very large data stores.

show abstract

Weighted Random Sampling over Data Streams

Cited by 32 publications

References 15 publications

Smallest enclosing ball for probabilistic data

Smallest enclosing ball for probabilistic data

Learning Fine-Grained Image Similarity with Deep Ranking

The case for sampling on very large file systems

Contact Info

Product

Resources

About