2018 IEEE 11th International Conference on Cloud Computing (CLOUD) 2018
DOI: 10.1109/cloud.2018.00042
|View full text |Cite
|
Sign up to set email alerts
|

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

Abstract: In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such data processing at scale. Specifically, Spark leverages distributed memory to cache the intermediate results, represented as Resilient Distributed Datasets (RDDs). This gives Spark an advantage over other parallel frameworks for implementations of iterative machine learning and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(14 citation statements)
references
References 36 publications
(51 reference statements)
0
14
0
Order By: Relevance
“…MR-SPS [21] designs a scalable parallel scheduling algorithm which improves scalability and performance of a cluster by managing workload and data locality. Studies [22]- [24] further investigate storage-related resource management problems, in order to improve the system performance bottlenecked by I/Os. BGMRS [25] is a MapReduce Scheduler based on the Bipartite Graph model.…”
Section: Related Workmentioning
confidence: 99%
“…MR-SPS [21] designs a scalable parallel scheduling algorithm which improves scalability and performance of a cluster by managing workload and data locality. Studies [22]- [24] further investigate storage-related resource management problems, in order to improve the system performance bottlenecked by I/Os. BGMRS [25] is a MapReduce Scheduler based on the Bipartite Graph model.…”
Section: Related Workmentioning
confidence: 99%
“…They found that in-memory data analytics has some constraints with respect to limitations and performance. Yang et al [19] studied Apache Spark for data caching optimization with respect to big data analytics. They found that its RDD feature is very useful in this regard.…”
Section: Related Workmentioning
confidence: 99%
“…80% reduction in the distance, on average, was achieved compared to the distance obtained by direct transmission. Several studies discussed QoS routing in WSNs including [19], [20], [21]. The study in [19] presented a multi-objective genetic algorithm for efficient QoS routing in two tiered WSNs.…”
Section: Related Workmentioning
confidence: 99%