2021
DOI: 10.1109/access.2021.3102645
|View full text |Cite
|
Sign up to set email alerts
|

Influencing Factors in the Scalability of Distributed Stream Processing Jobs

Abstract: More and more use cases require fast, accurate, and reliable processing of large volumes of data. To do this, a distributed stream processing framework is needed which can distribute the load over several machines. In this work, we study and benchmark the scalability of stream processing jobs in four popular frameworks: Flink, Kafka Streams, Spark Streaming, and Structured Streaming. Besides that, we determine the factors that influence the performance and efficiency of scaling processing jobs with distinct ch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 50 publications
0
7
0
Order By: Relevance
“…While MQTT is ideal for real-time data communicate near the edge, it is not the most ideal choice for data streaming at the core where data from all distributed segments of the facility sinks. Data moves from MQTT to Kafka which is designed to handle high-velocity, high-volume real-time data [59,60], and is commonly used in real-time event detection systems [61]. Once data is available, the microservices-based design plays a large role in ensuring timely execution of various tasks [62,63].…”
Section: Operational Costmentioning
confidence: 99%
“…While MQTT is ideal for real-time data communicate near the edge, it is not the most ideal choice for data streaming at the core where data from all distributed segments of the facility sinks. Data moves from MQTT to Kafka which is designed to handle high-velocity, high-volume real-time data [59,60], and is commonly used in real-time event detection systems [61]. Once data is available, the microservices-based design plays a large role in ensuring timely execution of various tasks [62,63].…”
Section: Operational Costmentioning
confidence: 99%
“…Although including a cost model in cloud benchmarking studies is considered good scientific practice [37], in existing benchmarking studies on FaaS [24,[38][39][40][41] and DSP [4,42,43], cost evaluations can mainly be found for cloud functions, where the pay-perexecution pricing model has presented a significant paradigm shift.…”
Section: Related Workmentioning
confidence: 99%
“…In the following, we give a brief overview of modern distributed stream processing frameworks, particularly suited for implementing event-driven microservices. 6 For a detailed comparison, see the works of, for example, Hesse and Lorenz [HL15], Fragkoulis et al [FCK + 20], and van Dongen [vDon21].…”
Section: Modern Stream Processing Frameworkmentioning
confidence: 99%
“…OSPBench provides benchmarks for analyzing traffic sensor data. Besides evaluations of latency, throughput, and resource usage, van Dongen and van den Poel used OSPBench to also evaluate scalability [vDvdP21b] and fault recovery [vDvdP21a]. In contrast to most other benchmarks, OSPBench provides implementations for the rather new framework Kafka Streams, which is also intensively studied in this thesis.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation