Architecture of a High-Speed MPI_Bcast Leveraging Software-Defined Network

Khureltulga, Dashdavaa; Date, Susumu; Yamanaka, Hiroaki; Kawai, Eiji; Watashiba, Yasuhiro; Ichikawa, Kohei; Abe, Hirotake; Shimojo, Shinsuke

doi:10.1007/978-3-642-54420-0_86

Cited by 13 publications

(5 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our previous works, we have proposed network control algorithms for MPI functions such as MPI Bcast and MPI Allreduce that utilize the SDN architecture [2], [3]. Based on a prototype implementation and evaluation using physical SDN switches, we have confirmed that such network control can actually improve application performance.…”

Section: Introductionsupporting

confidence: 63%

See 1 more Smart Citation

Concept and Design of SDN-Enhanced MPI Framework

Takahashi

Khureltulga

Munkhdorj

et al. 2015

2015 Fourth European Workshop on Software Defined Networks

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Introductionsupporting

confidence: 63%

“…The controller can be extended by multiple different network control algorithms. We have already implemented and evaluated some examples of such algorithms in our prior works [2], [3]. The log analyzer is responsible for gathering runtime profiling logs of MPI applications and pass them to the SDN controller.…”

Section: Introductionmentioning

confidence: 99%

Concept and Design of SDN-Enhanced MPI Framework

Takahashi

Khureltulga

Munkhdorj

et al. 2015

2015 Fourth European Workshop on Software Defined Networks

Self Cite

View full text Add to dashboard Cite

show abstract

“…Two papers [11], [12] focus on the InfiniBand clusters with hardware-supported multicast, which can improve the overall performance of broadcast significantly and however are closely dependent of the underlying interconnects. Additionally, A study [13] demonstrates that the broadcast performance can get improved on the Software-Designed network, which is of controllability. The MPI broadcast operations get optimized as a result of the network hardware acceleration for broadcast provided by the Blue Gene/Q, shown in paper [14].…”

Section: Related Workmentioning

confidence: 99%

A Bandwidth-Saving Optimization for MPI Broadcast Collective Operation

Zhou

Marjanovic

Niethammer

et al. 2015

2015 44th International Conference on Parallel Processing Workshops

View full text Add to dashboard Cite

Abstract-The efficiency and scalability of MPI collective operations, in particular the broadcast operation, plays an integral part in high performance computing applications. MPICH, as one of the contemporary widely-used MPI software stacks, implements the broadcast operation based on point-to-point operation. Depending on the parameters, such as message size and process count, the library chooses to use different algorithms, as for instance binomial dissemination, recursive-doubling exchange or ring all-to-all broadcast (allgather). However, the existing broadcast design in latest release of MPICH does not provide good performance for large messages (lmsg) or medium messages with non-power-of-two process counts (mmsg-npof2) due to the inner suboptimal ring allgather algorithm. In this paper, based on the native broadcast design in MPICH, we propose a tuned broadcast approach with bandwidth-saving in mind catering to the case of lmsg and mmsg-npof2. Several comparisons of the native and tuned broadcast designs are made for different data sizes and program sizes on Cray XC40 cluster. The results show that the performance of the tuned broadcast design can get improved by a range from 2% to 54% for lmsg and mmsg-npof2 in terms of user-level testing.

show abstract

“…Out of this, we give the highest priority to SDN-DSM-related data to ensure DSM get fastest response time in underlying network. Multicast address for this DSM object Different from the SDN multicast implemented in [17], the SDN multicast proposed in this paper is process-oriented. It means that, all the processes using same distributed shared memory will register to same multicast group.…”

Section: 1implemenatation Of Distributed Shared Memorymentioning

confidence: 99%