2017
DOI: 10.1109/tpds.2016.2587645
|View full text |Cite
|
Sign up to set email alerts
|

iShuffle: Improving Hadoop Performance with Shuffle-on-Write

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
47
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 81 publications
(48 citation statements)
references
References 21 publications
1
47
0
Order By: Relevance
“…Within this framework, data shuffling often appears to limit the performance of distributed computing applications, including self-join [6], tera-sort [7], and machine learning algorithms [8]. For example, in a Facebook's Hadoop cluster, it is observed that 33% of the overall job execution time is spent on data shuffling [8].…”
mentioning
confidence: 99%
“…Within this framework, data shuffling often appears to limit the performance of distributed computing applications, including self-join [6], tera-sort [7], and machine learning algorithms [8]. For example, in a Facebook's Hadoop cluster, it is observed that 33% of the overall job execution time is spent on data shuffling [8].…”
mentioning
confidence: 99%
“…However, this approach relies on the RDMA feature of Infiniband network, which is not available on commodity network hardware. iShuffle [19] proposed a independent shuffle service for multi-tenant Hadoop clusters. It decouples the shuffle and reduce, so that shuffle can be performed without running reduce tasks.…”
Section: Related Workmentioning
confidence: 99%
“…One of the most popular cloud computing platform is Hadoop, an open source MapReduce implementation for processing large datasets. In the Hadoop context, the shuffle phase is the process of transferring data from mappers to reducers, which becomes the bottleneck in large jobs . In item‐based CF algorithm, computing the similarity matrix for items and calculating the prediction matrix for users are the most resource‐intensive operations; thus, reducing the intermediate data during shuffle phase can provide a substantial performance gain.…”
Section: Introductionmentioning
confidence: 99%
“…In the Hadoop context, the shuffle phase is the process of transferring data from mappers to reducers, which becomes the bottleneck in large jobs. 6 In item-based CF algorithm, computing the similarity matrix for items and calculating the prediction matrix for users are the most resource-intensive operations; thus, reducing the intermediate data during shuffle phase can provide a substantial performance gain. In this paper, we propose an optimized MapReduce for the item-based CF algorithm integrated with empirical factors.…”
Section: Introductionmentioning
confidence: 99%