Range‐Efficient Counting of Distinct Elements in a Massive Data Stream

Pavan, A.; Tirthapura, Srikanta

doi:10.1137/050643672

Cited by 33 publications

(44 citation statements)

References 14 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For p ≤ 1, there is no expectation, while for general p these are heavy-tailed, so there is a nonnegligible (1/poly(n)) probability of observing a value that is poly(n) times c. Nevertheless, the references above give (ε, δ)-approximations for these problems in O * (1) space, and by our transformation, we obtain near-optimal PAPs. We also get a near-optimal PAP for the related distinct summation problem in sensor networks [53], which also does not have a sharply concentrated NBE. Here, for each j ∈ [n] there is a vj ∈ {1, .…”

Section: Repeat the Followingmentioning

confidence: 99%

“…, M } n , and the max-dominance norm is n j=1 max(xj, yj). This problem, and its generalization, the dominant p-norm, defined as ( n j=1 max(xj, yj) p ) 1/p for p > 0, are studied in [19,53,56,57,58] (in [40] this problem is instead studied for p < 0, which is useful for coordinatewise minima). There are no sharply concentrated NBEs known for p > 0.…”

Section: Repeat the Followingmentioning

confidence: 99%

“…One advantage is that we transform any protocol for p into a PAP, making new tradeoffs possible. We can use protocols more suitable for inputs given as a list of ranges [5,12,27,53], with faster update time [42,52], or that use less randomness [42,43]. For example, we improve the update time of [47] for 2 by a factor of k using the algorithm of [59] with ε = 1/ log n (to do binary search), while for p ∈ (0, 2) we improve [47] by a factor of k/poly(log log n) using the algorithm of [42].…”

Section: Applicationsmentioning

confidence: 99%

See 2 more Smart Citations

Near-optimal private approximation protocols via a black box transformation

Woodruff

2011

Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing

View full text Add to dashboard Cite

We show the following transformation: any two-party protocol for outputting a (1 + ε)-approximation to f (x, y) = n j=1 g(xj, yj) with probability at least 2/3, for any nonnegative efficienty computable function g, can be transformed into a two-party private approximation protocol with only a polylogarithmic factor loss in communication, computation, and round complexity. In general it is insufficient to use secure function evaluation or fully homomorphic encryption on a standard, non-private protocol for approximating f . This is because the approximation may reveal information about x and y that does not follow from f (x, y). Applying our transformation and variations of it, we obtain near-optimal private approximation protocols for a wide range of problems in the data stream literature for which previously nothing was known. We give near-optimal private approximation protocols for the p-distance for every p ≥ 0, for the heavy hitters and importance sampling problems with respect to any p-norm, for the max-dominance and other dominant p-norms, for the distinct summation problem, for entropy, for cascaded frequency moments, for subspace approximation and block sampling, and for measuring independence of datasets. Using a result for data streams, we obtain private approximation protocols with polylogarithmic communication for every non-decreasing and symmetric function g(xj, yj) = h(xj − yj) with at most quadratic growth. If the original (non-private) protocol is a simultaneous protocol, e.g., a sketching algorithm, then our only cryptographic assumption is efficient symmetric computationally-private information retrieval; otherwise it is fully homomorphic encryption. For all but one of these problems, the original protocol is a sketching algorithm. Our protocols generalize straightforwardly to more than two parties.

show abstract

Section: Repeat the Followingmentioning

confidence: 99%

Section: Repeat the Followingmentioning

confidence: 99%

Section: Applicationsmentioning

confidence: 99%

See 1 more Smart Citation

Near-optimal private approximation protocols via a black box transformation

Woodruff

2011

Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing

View full text Add to dashboard Cite

show abstract

“…This work uses a different technique than was used in [45]. The algorithm in [45] was based on sampling, along the lines of [43], while our algorithm is based on sub-sampling and linear sketches with special properties.…”

Section: Related Workmentioning

confidence: 99%

“…The first works to try to overcome this were for 1-dimensional rectangles, i.e., line segments. The notion of a range-efficient sketch was introduced in [5] for F k -estimation, and further refined for F0 estimation in [43,47], allowing one to process a segment in time only logarithmic in its length. Many other problems have been reduced to range-efficient F k -estimation, for k ≥ 0, such as distinct summation problem [18,43], duplicate insensitive sketches [38], maximum-dominance norm [19], self-join size of the symmetric difference of relations [44], and counting triangles in graphs [5].…”

Section: Problem Definitionmentioning

confidence: 99%

Rectangle-efficient aggregation in spatial data streams

Tirthapura

Woodruff

2012

Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Self Cite

View full text Add to dashboard Cite

We consider the estimation of aggregates over a data stream of multidimensional axis-aligned rectangles. Rectangles are a basic primitive object in spatial databases, and efficient aggregation of rectangles is a fundamental task. The data stream model has emerged as a de facto model for processing massive databases in which the data resides in external memory or the cloud and is streamed through main memory. For a point p, let n(p) denote the sum of the weights of all rectangles in the stream that contain p. We give nearoptimal solutions for basic problems, including (1) the k-th frequency moment F k = points p |n(p)| k , (2) the counting version of stabbing queries, which seeks an estimate of n(p) given p, and (3) identification of heavy-hitters, i.e., points p for which n(p) is large. An important special case of F k is F0, which corresponds to the volume of the union of the rectangles. This is a celebrated problem in computational geometry known as "Klee's measure problem", and our work yields the first solution in the streaming model for dimensions greater than one.

show abstract

Stream Sampling

Lahiri¹,

Tirthapura²

2009

Encyclopedia of Database Systems

View full text Add to dashboard Cite

Range‐Efficient Counting of Distinct Elements in a Massive Data Stream

Cited by 33 publications

References 14 publications

Near-optimal private approximation protocols via a black box transformation

Near-optimal private approximation protocols via a black box transformation

Rectangle-efficient aggregation in spatial data streams

Stream Sampling

Contact Info

Product

Resources

About