2015
DOI: 10.1007/s00453-015-9974-0
|View full text |Cite
|
Sign up to set email alerts
|

Space-Efficient Estimation of Statistics Over Sub-Sampled Streams

Abstract: In many stream monitoring situations, the data arrival rate is so high that it is not even possible to observe each element of the stream. The most common solution is to subsample the data stream and use the sample to infer properties and estimate aggregates of the original stream. However, in many cases, the estimation of aggregates on the original stream cannot be accomplished through simply estimating them on the sampled stream, followed by a normalization. We present algorithms for estimating frequency mom… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…Continuous random sampling from the set of distinct elements in a stream has been considered in [26]. The question of how to process a "sampled stream", i.e., once a stream has been sampled, is considered in [27]. A model of distributed streams related to ours was considered in [28], [29].…”
Section: Related Workmentioning
confidence: 99%
“…Continuous random sampling from the set of distinct elements in a stream has been considered in [26]. The question of how to process a "sampled stream", i.e., once a stream has been sampled, is considered in [27]. A model of distributed streams related to ours was considered in [28], [29].…”
Section: Related Workmentioning
confidence: 99%
“…This challenge stems from the unbounded nature of data streams and the impossibility of storing them. We need to design single-pass [25] and low-time-complexity [31] algorithms analogously to what is done today in stream processing, where constant-time and polylog-space-complexity algorithms find a vast application [32]. Moreover, those models will need to explain why the predictions are correct and how much the users trust them.…”
Section: Challenges and Benefitsmentioning
confidence: 99%
“…Last, developing stateful incremental algorithms that learn one sample at a time is very important. The smaller the data to process, the less sophisticated and the more scalable the tools are [31].…”
Section: Ziffer Et Al / Towards Time-evolving Analytics 11mentioning
confidence: 99%
“…Namely, nearly all lower bounds for the space complexity of randomized streaming algorithms are derived via reductions from communication problems. For an incomplete list of such reductions, see [58,61,45,42,46,10,16,57,48,49,40] and the references therein. Now nearly all such lower bounds (and all of the ones that were just cited) hold in either the 2-party setting (G has 2 vertices), the coordinator model, or the black-board model.…”
Section: Multi-party Communicationmentioning
confidence: 99%