Load Shedding for Window Joins on Multiple Data Streams

Law, Yan-Nei; Zaniolo, Carlo

doi:10.1109/icdew.2007.4401054

Cited by 7 publications

(13 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, in the context of frequent pattern mining, pattern verification [18], and any other application that consists of only counting queries, we make the following observation: from (14) and (15) we notice that both the uniform and proportional 16 load shedding policies produce the same total variance (G uni = G prop ). However, the uniform approach is still more favorable since it does not require knowing the f k values, while the proportional method does.…”

Section: B Minimizing the Relative Errormentioning

confidence: 99%

See 1 more Smart Citation

Optimal load shedding with aggregates and mining queries

Mozafari

Zaniolo

2010

2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)

Self Cite

View full text Add to dashboard Cite

Abstract-To cope with bursty arrivals of high-volume data, a DSMS has to shed load while minimizing the degradation of Quality of Service (QoS). In this paper, we show that this problem can be formalized as a classical optimization task from operations research, in ways that accommodate different requirements for multiple users, different query sensitivities to load shedding, and different penalty functions. Standard nonlinear programming algorithms are adequate for non-critical situations, but for severe overloads, we propose a more efficient algorithm that runs in linear time, without compromising optimality. Our approach is applicable to a large class of queries including traditional SQL aggregates, statistical aggregates (e.g., quantiles), and data mining functions, such as k-means, naive Bayesian classifiers, decision trees, and frequent pattern discovery (where we can even specify a different error bound for each pattern). In fact, we show that these aggregate queries are special instances of a broader class of functions, that we call reciprocalerror aggregates, for which the proposed methods apply with full generality.Finally, we propose a novel architecture for supporting load shedding in an extensible system, where users can write arbitrary User Defined Aggregates (UDA), and thus confirm our analytical findings with several experiments executed on an actual DSMS.

show abstract

Section: B Minimizing the Relative Errormentioning

confidence: 99%

“…The prior work has addressed the processing of join queries under load shedding [10], [11], [16], which usually involves adhoc heuristics. For aggregate queries, which is the focus of this paper, we instead use random load shedding.…”

Section: Related Workmentioning

confidence: 99%

Optimal load shedding with aggregates and mining queries

Mozafari

Zaniolo

2010

2010 IEEE 26th International Conference on Data Engineering (ICDE 2010)

Self Cite

View full text Add to dashboard Cite

show abstract

“…They considered the MAX-subset measure, which maximizes the number of tuples in the approximate output of the join. The MAX-subset measure was considered for load shedding in many algorithms [40][41][42]. Das et al [26] also proposed two heuristics to determine the priority of tuples in an online join: PROB and LIFE.…”

Section: Load Sheddingmentioning

confidence: 99%

“…Their method focused on the memory-limited situation. Many other studies including [26,[40][41][42] also assumed the memory-limited situation, and then discussed their load shedding algorithms. On the other hand, Gedik et al [39,44] emphasized a situation where the CPU becomes a bottleneck (i.e., when an input arrival rate exceeds CPU processing speed), and then proposed load shedding techniques to shed the CPU load.…”

Section: Load Sheddingmentioning

confidence: 99%

See 1 more Smart Citation

A Review of Window Query Processing for Data Streams

Kim¹,

Kim²

2013

Journal of Computing Science and Engineering

View full text Add to dashboard Cite

In recent years, progress in hardware technology has resulted in the possibility of monitoring many events in real time.The volume of incoming data may be so large, that monitoring all individual data might be intractable. Revisiting any particular record can also be impossible in this environment. Therefore, many database schemes, such as aggregation, join, frequent pattern mining, and indexing, become more challenging in this context. This paper surveys the previous efforts to resolve these issues in processing data streams. The emphasis is on specifying and processing sliding window queries, which are supported in many stream processing engines. We also review the related work on stream query processing, including synopsis structures, plan sharing, operator scheduling, load shedding, and disorder control.

show abstract

A Load Shedding Framework for XML Stream Joins

Dash

Fegaras

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Load Shedding for Window Joins on Multiple Data Streams

Cited by 7 publications

References 22 publications

Optimal load shedding with aggregates and mining queries

Optimal load shedding with aggregates and mining queries

A Review of Window Query Processing for Data Streams

A Load Shedding Framework for XML Stream Joins

Contact Info

Product

Resources

About