An MPI Halo-Cell Implementation for Zero-Copy Abstraction

Besnard, Jean-Baptiste; Malony, Allen D.; Shende, Sameer; Pérache, Marc; Carribault, Patrick; Jaeger, Julien

doi:10.1145/2802658.2802669

Cited by 9 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea for a zero-copy framework for ghost cell exchanges has been discussed in Besnard et al (2015). Similar to our work, the authors strip away the node-local copying of ghost cell data and instead directly access the inner ghost cell data structures of node-local (and neighboring) ranks.…”

Section: A Strategy For Better Interoperability Between Gaspi and mentioning

confidence: 84%

“…This implementation relies on a “threads as processes” (MPC) MPI implementation, while we start from a more general approach; we directly use MPI shared memory across multiple processes and run the solver (wherever we subsequently need to exchange corresponding data) in these shared windows. Our work also conceptually extends the work in Besnard et al (2015), as we implement a model where communication (not just computation) is visible for all processes in the shared window. Furthermore, the implementation for the pipelined Allreduce, which is presented in Section 4.4, would not be feasible with the approach from Besnard et al (2015), as all processes require access to node-local communication to optimally sustain the pipeline.…”

Section: A Strategy For Better Interoperability Between Gaspi and mentioning

confidence: 91%

“…Our work also conceptually extends the work in Besnard et al (2015), as we implement a model where communication (not just computation) is visible for all processes in the shared window. Furthermore, the implementation for the pipelined Allreduce, which is presented in Section 4.4, would not be feasible with the approach from Besnard et al (2015), as all processes require access to node-local communication to optimally sustain the pipeline.…”

Section: A Strategy For Better Interoperability Between Gaspi and mentioning

confidence: 91%

See 2 more Smart Citations

Interoperability strategies for GASPI and MPI in large-scale scientific applications

Simmendinger¹,

Iakymchuk

Cebamanos

et al. 2018

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

One of the main hurdles of PGAS approaches is the dominance of MPI, which as a de-facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like GASPI without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application demonstrate the viability of this approach.

show abstract

Section: A Strategy For Better Interoperability Between Gaspi and mentioning

confidence: 84%

Section: A Strategy For Better Interoperability Between Gaspi and mentioning

confidence: 91%

Section: A Strategy For Better Interoperability Between Gaspi and mentioning

confidence: 91%

See 1 more Smart Citation

Interoperability strategies for GASPI and MPI in large-scale scientific applications

Simmendinger¹,

Iakymchuk

Cebamanos

et al. 2018

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

show abstract

“…Common optimizations for improving scaling on HPC systems include options for combining MPI data exchanges for a number of arrays (e.g., vector or tensor compo-nents) or increasing the width of the grid halo region (see, e.g. [3]) for reducing latency of MPI communications. The latter allows to reduce the number of calls to MPI functions but at the cost of additional computational overhead, which may be negligible when the size of the problem on MPI-process is comparatively small.…”

Section: Methodsmentioning

confidence: 99%

Direct Numerical Simulation of Stratified Turbulent Flows and Passive Tracer Transport on HPC Systems: Comparison of CPU Architectures

2021

JSFI

View full text Add to dashboard Cite

“…A more intrusive approach is to get rid of halo intranode halo buffers in the users code and directly expose the neighbouring intranode processor's memory. So the intranode transfer becomes simple direct memory-to-memory copy of halo data without any intermediate buffers (zero-copy ) [5].…”

Section: Shhalo Framework For Shared-memory Halo Communicationmentioning

confidence: 99%

Hardware Locality-Aware Partitioning and Dynamic Load-Balancing of Unstructured Meshes for Large-Scale Scientific Applications

Mohanamuraly

Staffelbach

2020

Proceedings of the Platform for Advanced Scientific Computing Conference

View full text Add to dashboard Cite

We present an open-source topology-aware hierarchical unstructured mesh partitioning and load-balancing tool TreePart. The framework provides powerful abstractions to automatically detect and build hierarchical MPI topology resembling the hardware at runtime. Using this information it intelligently chooses between shared and distributed parallel algorithms for partitioning and loadbalancing. It provides a range of partitioning methods by interfacing with existing shared and distributed memory parallel partitioning libraries. It provides powerful and scalable abstractions like onesided distributed dictionaries and MPI3 shared memory based halo communicators for optimising HPC codes. The tool was successfully integrated into our in-house code and we present results from a large-eddy simulation of a combustion problem. CCS CONCEPTS • Computing methodologies → Massively parallel and highperformance simulations; Massively parallel algorithms.

show abstract

An MPI Halo-Cell Implementation for Zero-Copy Abstraction

Cited by 9 publications

References 13 publications

Interoperability strategies for GASPI and MPI in large-scale scientific applications

Interoperability strategies for GASPI and MPI in large-scale scientific applications

Direct Numerical Simulation of Stratified Turbulent Flows and Passive Tracer Transport on HPC Systems: Comparison of CPU Architectures

Hardware Locality-Aware Partitioning and Dynamic Load-Balancing of Unstructured Meshes for Large-Scale Scientific Applications

Contact Info

Product

Resources

About