2018 IEEE 34th International Conference on Data Engineering (ICDE) 2018
DOI: 10.1109/icde.2018.00169
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking Distributed Stream Data Processing Systems

Abstract: The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of thre… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
133
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 162 publications
(134 citation statements)
references
References 20 publications
1
133
0
Order By: Relevance
“…For example, the number of deployed nodes alone can have a different impact on each of the considered frameworks. In our case, we kept a rather simple nodes topology, but for more complex topologies the results can be different (as previous research has shown, e.g., [37]). Another internal threat to validity is given by the implementation differences of the algorithm on each platform.…”
Section: Resultsmentioning
confidence: 98%
See 1 more Smart Citation
“…For example, the number of deployed nodes alone can have a different impact on each of the considered frameworks. In our case, we kept a rather simple nodes topology, but for more complex topologies the results can be different (as previous research has shown, e.g., [37]). Another internal threat to validity is given by the implementation differences of the algorithm on each platform.…”
Section: Resultsmentioning
confidence: 98%
“…Karimov et al [37] found Flink to have more than three times faster throughput than Spark and Storm for aggregations. Joins were more than two times faster for Flink than Spark.…”
Section: A Compared Frameworkmentioning
confidence: 99%
“…Following table-1 shows the comparative analysis of distinct tools and techniques to handle the issues of latency and throughput. [4] Buffering mechanism DSP Engine [6] Queue of Events before processing Google Data Flow [5] Watermark & Trigger…”
Section: B Event Time Windowmentioning
confidence: 99%
“…Several solutions are available to handle this problem [4]. Distributed computing is one possible solution [5], and become the most efficient and fault-tolerant method for companies to store and process massive amounts of data. Among this new group of tools, MapReduce and Spark are the most commonly used cluster computing tools.…”
Section: Introductionmentioning
confidence: 99%