Hierarchical Optimization of MPI Reduce Algorithms

Hasanov, Khalid; Lastovetsky, Alexey

doi:10.1007/978-3-319-21909-7_3

Cited by 6 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This means that the hierarchical transformation can improve the native reduction operation as well. It is expected that [7] the overhead from the MPI_Comm_split operation should affect only reduce operations with smaller message sizes. Figure 3 validates this with experimental results.…”

Section: Methodsmentioning

confidence: 99%

“…Figure 4 presents experiments with the hierarchical pipeline reduce with a message size of 16 KB with 1 KB segmentation. The performance of the pipeline algorithm with larger messages and segment sizes of 32 and 64 KB can be found in [7]. Figure 5 shows the speedup of the hierarchical transformation of native Open MPI reduce operation, linear, chain, pipeline, binary, binomial, and in-order binary reduce algorithms with message sizes starting from 16 KB up to 16 MB.…”

Section: Methodsmentioning

confidence: 99%

“…theoretical costs given in the Table 1. We do not provide theoretical analysis of those algorithms, as this work is an extension of our conference paper [7] which already provides detailed discussion of them. Instead, we provide a broader discussion of the MPI allreduce algorithms.…”

Section: Mpi Reduce and Allreduce Algorithmsmentioning

confidence: 99%

“…Because of the limited space, we only show theoretical analysis of the hierarchical flat tree reduce algorithm here. A detailed theoretical analysis of the other hierarchical reduce algorithms can be found in our previous work [7].…”

Section: Hierarchical Optimization Of Mpi Reduce Algorithmsmentioning

confidence: 99%

“…Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications [4,5,17]. This approach has been successfully applied to the most popular parallel matrix multiplication algorithm, SUMMA [6], and the state-of-the-art MPI broadcast and reduce algorithms, demonstrating significant multifold performance gains, especially on large-scale HPC systems [18]. In this article, we extend our conference paper [7] on topology-oblivious hierarchical optimization of MPI reduce operation by including a study of optimization of MPI allreduce operation using the same technique.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Hierarchical redesign of classic MPI reduction algorithms

Hasanov

Lastovetsky

2016

J Supercomput

Self Cite

View full text Add to dashboard Cite

Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecturespecific collective algorithms have been proposed and implemented in the state-of-theart MPI implementations. Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications. This approach has been successfully applied to the most popular parallel matrix multiplication algorithm, SUMMA, and the state-of-the-art MPI broadcast algorithms, demonstrating significant multifold performance gains, especially for large-scale HPC systems. In this paper, we apply this approach to optimization of the MPI Reduce and Allreduce operations. Theoretical analysis and experimental results on a cluster of Grid'5000 platform are presented.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Mpi Reduce and Allreduce Algorithmsmentioning

confidence: 99%

Section: Hierarchical Optimization Of Mpi Reduce Algorithmsmentioning

confidence: 99%