2017 IEEE International Young Scientists Forum on Applied Physics and Engineering (YSF) 2017
DOI: 10.1109/ysf.2017.8126655
|View full text |Cite
|
Sign up to set email alerts
|

Performance evaluation of distributed computing environments with Hadoop and Spark frameworks

Abstract: Abstract-Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing the computations among a number of compute nodes. In this work, performance of distributed computing environments on the basis of Hadoop and Spark frameworks is estimated for real and virtual versions of clusters. As a test task, we chose the classic use case of wor… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…It also introduces a directed acyclic graph (DAG) task segmentation mechanism to operate on RDD in a way similar to MapReduce. Spark in-memory computing is much faster than Hadoop, which makes Spark the current mainstream batch big data analysis platform [55][56][57] . The pros and cons of Spark in big data analysis will be discussed in the next section.…”
Section: Sparkmentioning
confidence: 99%
“…It also introduces a directed acyclic graph (DAG) task segmentation mechanism to operate on RDD in a way similar to MapReduce. Spark in-memory computing is much faster than Hadoop, which makes Spark the current mainstream batch big data analysis platform [55][56][57] . The pros and cons of Spark in big data analysis will be discussed in the next section.…”
Section: Sparkmentioning
confidence: 99%
“…Likewise, the in-house Hadoop cluster setup and Amazon EC2 instances are also used to evaluate the Hadoop performance. Khan et al [30] have modeled the estimation of the provisioning of the resources and completion time of the jobs. Furthermore, the Hadoop and Spark-based distributed system performance has been evaluated by Taran et al [31].…”
Section: Related Workmentioning
confidence: 99%
“…The primary reason for the performance decline was evident as Spark cache size could not fit into the memory for the larger dataset. Taran et al [34] quantified performance differences of Hadoop and Spark using WordCount dataset which was ranging from 100 KB to 1 GB. It was observed that Hadoop framework was five times faster than Spark when the evaluation was performed using a larger set of data sources.…”
Section: Processing Speedmentioning
confidence: 99%