Proceedings of the 19th Annual International Conference on Supercomputing 2005
DOI: 10.1145/1088149.1088183
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of MPI collective communication on BlueGene/L systems

Abstract: BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of low power dual-processor compute nodes interconnected by high speed torus and collective networks. Because compute nodes do not have shared memory, MPI is the the natural programming model for this machine. The BlueGene/L MPI library is a port of MPICH2.In this paper we discuss the implementation of MPI collectives on BlueGene/L. The MPICH2 implementation of MPI collectives is based on point-to-point communication primi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
75
0
1

Year Published

2005
2005
2021
2021

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(76 citation statements)
references
References 19 publications
0
75
0
1
Order By: Relevance
“…The butterfly-like algorithm has been developed some times ago [22,27] and has been extended to handle non-power-of-two numbers of processes [23]. Various architecture specific all-reduce schemes have also been developed [1,4,12,17,26]. An all-reduce algorithm was designed for BlueGene/L systems in [1].…”
Section: Ethernet Switched Cluster Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The butterfly-like algorithm has been developed some times ago [22,27] and has been extended to handle non-power-of-two numbers of processes [23]. Various architecture specific all-reduce schemes have also been developed [1,4,12,17,26]. An all-reduce algorithm was designed for BlueGene/L systems in [1].…”
Section: Ethernet Switched Cluster Resultsmentioning
confidence: 99%
“…Various architecture specific all-reduce schemes have also been developed [1,4,12,17,26]. An all-reduce algorithm was designed for BlueGene/L systems in [1]. In [12], an all-reduce scheme that takes advantage of remote DMA (RDMA) capability was developed for VIA-based clusters.…”
Section: Ethernet Switched Cluster Resultsmentioning
confidence: 99%
“…The BlueGene/L and BlueGene/Q supercomputers feature specialized collective networks that perform these reductions completely in hardware, using ALUs embedded in network routers [7,14]. In contrast to Coup, their main advantage is minimizing the latency of scalar or short reductions across a very large number of nodes.…”
Section: Additional Related Workmentioning
confidence: 99%
“…BG/L-MPI [9] has successfully exploited the rich features of BG/L in terms of the network topology, special purpose network hardware, and architectural compromises. While BG/L-MPI is originally ported from MPICH2 [3], its collective routines have demonstrated superior performance comparing to the original implementation and is close to the peak capabilities of the networks and processors.…”
Section: Blue Gene/l: a Parallel I/o Perspectivementioning
confidence: 99%
“…all processes reading the same data from a file), the inter-process data exchange phase may dominate the overall performance. To address the issue of the communication phase of MPI I/O collective operations, we rely on the BG/L MPI implementation [9] as it has successfully explored and utilized the rich network features of BG/L machine. We tuned the communication phase of MPI I/O collective operations to choose the best performing communication method among BG/L MPI routines.…”
Section: Communication Phase Optimizationsmentioning
confidence: 99%