Automatic Verification of Self-consistent MPI Performance Guidelines

Hunold, Sascha; Carpen-Amarie, Alexandra; Lübbe, Felix Donatus; Träff, Jesper Larsson

doi:10.1007/978-3-319-43659-3_32

Cited by 11 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It basically performs a brute-force search over all speci ed parameters and their ranges in Open MPI. A related method was proposed by Pjesivac-Grbovic et al [7], in which a quadtree scheme is used to encode the best collective algorithm for a given pair of (number of processes, message size). The quadtree is the internal data-structure for allowing a fast lookup of the best-suited algorithm.…”

Section: Background and Related Workmentioning

confidence: 99%

“…By using the PMPI-interface, it intercepts calls to a speci c MPI function, say MPI_Gather, and redirects them to an internally implemented MPI_Gather function, which uses MPI_Gatherv as its base implementation. Our approach is in the spirit of the approaches of Pjesivac-Grbovic et al [7] and Faraj et al [4]. The latter authors proposed the STAR-MPI library, which selects an algorithm for a collective operation (online) after benchmarking (timing) several algorithmic variants during the runtime of an application.…”

Section: Background and Related Workmentioning

confidence: 99%

“…Currently, PGMPITuneLib contains implementations of the performance guidelines listed in Equations (GL1)-(GL22), some of which were introduced before [6,11].…”

Section: Performance Guidelines and Semanticsmentioning

confidence: 99%

“…In previous work [6], we have implemented and tested several performance guidelines for blocking, collective MPI operations, such as MPI_Bcast. Our goal was to get an overview of how many libraries violate such guidelines in practice.…”

Section: Background and Related Workmentioning

confidence: 99%

“…In previous work [6], we have shown that many MPI libraries available on production systems violate performance guidelines for several blocking MPI collective operations. We have also demonstrated that guideline violations can be avoided by changing the algorithm used in a speci c case.…”

Section: Introductionmentioning

confidence: 98%

See 4 more Smart Citations

Autotuning MPI Collectives using Performance Guidelines

Hunold

Carpen-Amarie

2018

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

Self Cite

View full text Add to dashboard Cite

MPI collective operations provide a standardized interface for performing data movements within a group of processes. The e ciency of collective communication operations depends on the actual algorithm, its implementation, and the speci c communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for speci c collective operations. The strategy for selecting an e cient algorithm is often times prede ned (hard-coded) in MPI libraries, but some of them, such as Open MPI, allow users to change the algorithm manually. Finding the best algorithm for each case is a hard problem, and several approaches to tune these algorithmic parameters have been proposed. We use an orthogonal approach to the parameter-tuning of MPI collectives, that is, instead of testing individual algorithmic choices provided by an MPI library, we compare the latency of a speci c MPI collective operation to the latency of semantically equivalent functions, which we call the mock-up implementations. The structure of the mock-up implementations is de ned by selfconsistent performance guidelines. The advantage of this approach is that tuning using mock-up implementations is always possible, whether or not an MPI library allows users to select a speci c algorithm at run-time. We implement this concept in a library called PGMPITuneLib, which is layered between the user code and the actual MPI implementation. This library selects the best-performing algorithmic pattern of an MPI collective by intercepting MPI calls and redirecting them to our mock-up implementations. Experimental results show that PGMPITuneLib can signi cantly reduce the latency of MPI collectives, and also equally important, that it can help identifying the tuning potential of MPI libraries.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

“…Currently, PGMPITuneLib contains implementations of the performance guidelines listed in Equations (GL1)-(GL22), some of which were introduced before [6,11].…”

Section: Performance Guidelines and Semanticsmentioning

confidence: 99%

Section: Background and Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

See 3 more Smart Citations