2016 IEEE 32nd International Conference on Data Engineering (ICDE) 2016
DOI: 10.1109/icde.2016.7498273
|View full text |Cite
|
Sign up to set email alerts
|

When two choices are not enough: Balancing at scale in Distributed Stream Processing

Abstract: Abstract-Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated to keys which are significantly more frequent than others. Skew is remarkably more problematic in large deployments: having more workers implies fewer keys per worker, so it becomes harder to "average out" the cost of hot keys with cold keys.We propose a novel load … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 56 publications
(13 citation statements)
references
References 21 publications
0
13
0
Order By: Relevance
“…Single Machine x Fernandez [45] GP Imp External Cloud x x Lohrmann [91] GP Imp Cluster, Cloud x x Zygouras [158] CEP Dec Cloud x Schneider [125] GP Dec Cloud x x Rive i [118] GP Imp Cluster x Mayer [96,99] CEP Imp Cloud x Wu [147] GP Imp (External) Cluster x Nasir [108,109] GP Imp Cluster x x Saleh [120] CEP Dec Cluster x x Koliousis [78] GP Dec GPU x x x Zacheilas [153] CEP Dec Cloud x Nakamura [107] GP Imp Fog x Gedik [53] GP Spark Streaming and Structured Streaming by Spark [154]: As Spark applications originally processed batches, the Spark Streaming extensions process streamed data in micro batches. Structured Streaming interprets data streams as an unbounded table where each new data item extends the table.…”
Section: Parallelizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Single Machine x Fernandez [45] GP Imp External Cloud x x Lohrmann [91] GP Imp Cluster, Cloud x x Zygouras [158] CEP Dec Cloud x Schneider [125] GP Dec Cloud x x Rive i [118] GP Imp Cluster x Mayer [96,99] CEP Imp Cloud x Wu [147] GP Imp (External) Cluster x Nasir [108,109] GP Imp Cluster x x Saleh [120] CEP Dec Cluster x x Koliousis [78] GP Dec GPU x x x Zacheilas [153] CEP Dec Cloud x Nakamura [107] GP Imp Fog x Gedik [53] GP Spark Streaming and Structured Streaming by Spark [154]: As Spark applications originally processed batches, the Spark Streaming extensions process streamed data in micro batches. Structured Streaming interprets data streams as an unbounded table where each new data item extends the table.…”
Section: Parallelizationmentioning
confidence: 99%
“…is optimization technique is known as "the power of two choices" and provides signi cant improvement in load balancing [10]. Later, Nasir et al extend their approach to allow for more than two choices for "hot" keys that impose most of the workload [109]. In all of these approaches, a combiner is needed to combine the shu ed state of each key.…”
Section: Parallelization For General Streammentioning
confidence: 99%
“…Online Adaptation in Stream Processing. Several works proposed adaptive scaling and load balancing of stream-processing systems [22,40,41,45,53]. Lohrmann et al [37] adaptively adjusted the buffer sizes and performs task chaining according to the QoS constraints.…”
Section: Related Workmentioning
confidence: 99%
“…Partial-key based [43] partition functions [126] add aggregation cost to model [127] key splitting and local load estimation [139] associates a key to more than two possible nodes Executor-centric [128] elastic executors + model-based scheduler Migration-based [111] transactional migration protocol and thread-toslice mapping Table 4. A characterization of partitioning schemes.…”
Section: Partitioning Type System Main Focus Objectivementioning
confidence: 99%