2009
DOI: 10.1007/978-3-642-03770-2_30
|View full text |Cite
|
Sign up to set email alerts
|

Towards Efficient MapReduce Using MPI

Abstract: Abstract. MapReduce is an emerging programming paradigm for dataparallel applications. We discuss common strategies to implement a MapReduce runtime and propose an optimized implementation on top of MPI. Our implementation combines redistribution and reduce and moves them into the network. This approach especially benefits applications with a limited number of output keys in the map phase. We also show how anticipated MPI-2.2 and MPI-3 features, such as MPI Reduce local and nonblocking collective operations, c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0
3

Year Published

2011
2011
2018
2018

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 63 publications
(32 citation statements)
references
References 20 publications
0
29
0
3
Order By: Relevance
“…In fact, this kind of comparison is not completely appropriate. Recent research has shown how the MapReduce approach can be implemented using corresponding functions in the MPI protocol (Hoefler et al 2009). Our principal goal here is not to advocate the specific MapReduce implementation we have used here: rather it is to emphasise that several kinds of parallelism can be achieved by definitions of two simple functions.…”
Section: Discussionmentioning
confidence: 99%
“…In fact, this kind of comparison is not completely appropriate. Recent research has shown how the MapReduce approach can be implemented using corresponding functions in the MPI protocol (Hoefler et al 2009). Our principal goal here is not to advocate the specific MapReduce implementation we have used here: rather it is to emphasise that several kinds of parallelism can be achieved by definitions of two simple functions.…”
Section: Discussionmentioning
confidence: 99%
“…In terms of scheduling, literature [22] tried to use a priority-based scheduling strategy to improve efficiency of MapReduce. Literature [23] proposed the MapReduce optimized implementation based on MPI, using MPI-3 new features such as MPI Reduce Local to get 25% of the performance on the cluster of 127 nodes. Purdue University [24] researchers take the method of hunger-by loosening the synchronization requirements of schedule (eager scheduling) to improve efficiency of the MapReduce task [25] .Barcelona Supercomputer Center and researchers at the IBM Watson laboratory research scheduling strategy [26], with a view to improve performance.…”
Section: Summarization and Prospectmentioning
confidence: 99%
“…The power offered to users by this abstraction has advocated new approaches at solving large-scale problems in industrial settings [8]. There are also systems that have implemented MapReduce on top of MPI [13,22] as well as multi-GPU architectures [25].…”
Section: Simplified Large-scale Data Processingmentioning
confidence: 99%