2013
DOI: 10.1007/s00778-013-0335-9
|View full text |Cite
|
Sign up to set email alerts
|

Partitioning functions for stateful data parallelism in stream processing

Abstract: In this paper, we study partitioning functions for stream processing systems that employ stateful data parallelism to improve application throughput. In particular, we develop partitioning functions that are effective under workloads where the domain of the partitioning key is large and its value distribution is skewed. We define various desirable properties for partitioning functions, ranging from balance properties such as memory, processing, and communication balance, structural properties such as compactne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
80
0
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 102 publications
(81 citation statements)
references
References 21 publications
0
80
0
1
Order By: Relevance
“…The state migration must be performed by respecting the computation semantics (state partitions must be accessed atomically) and must not violate the strict sequential order in which items of the same group must be processed. Techniques to solve this issue in the case of autonomic/elastic solutions have been proposed in some past research works [9,15]. Symmetric actions are taken in the case of parallelism degree reduction.…”
Section: Fpsapmentioning
confidence: 99%
“…The state migration must be performed by respecting the computation semantics (state partitions must be accessed atomically) and must not violate the strict sequential order in which items of the same group must be processed. Techniques to solve this issue in the case of autonomic/elastic solutions have been proposed in some past research works [9,15]. Symmetric actions are taken in the case of parallelism degree reduction.…”
Section: Fpsapmentioning
confidence: 99%
“…To overcome this problem, a plethora of Adaptive Query Processing (AQP) techniques have been recently proposed in the literature aiming to adapt the runtime query plan in respond to changes in the execution environment or the characteristics of the streaming data [4][5][6][7][8]. The rationale followed by these AQP techniques can be condensed into a three-phase procedure, called adaptivity loop [9].…”
Section: Main Textmentioning
confidence: 99%
“…where G 1 and G 2 are given by (4) and Beta is the beta function [21]. Theα andβ estimates can then be found by minimizing Eq.…”
Section: Appendixmentioning
confidence: 99%
See 1 more Smart Citation
“…A recent paper [11] discusses how to define a partitioning function, which can balance load, memory and bandwidth, while ensuring changes to the partitioning function impacts as small a subset of keys as possible, in order to minimize the need for state migration. The method of "The Power of Two Choices" (PoTC) [29] continuously defines two hash functions h1(x) and h2(x), such that each key x can be sent to one of two alternative downstream operator instances.…”
Section: Adaptive Schedulermentioning
confidence: 99%