2011
DOI: 10.1007/s00450-011-0171-3
|View full text |Cite
|
Sign up to set email alerts
|

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters

Abstract: Data parallel architectures, such as General Purpose Graphics Units (GPGPUs) have seen a tremendous rise in their application for High End Computing. However, data movement in and out of GPGPUs remain the biggest hurdle to overall performance and programmer productivity. Applications executing on a cluster with GPUs have to manage data movement using CUDA in addition to MPI, the de-facto parallel programming standard. Currently, data movement with CUDA and MPI libraries is not integrated and it is not as effic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 108 publications
(58 citation statements)
references
References 6 publications
0
58
0
Order By: Relevance
“…To bridge the gap between the disjointed MPI and GPU programming models, researchers have recently developed GPU-integrated MPI solutions such as our MPI-ACC [6] framework and MVAPICH-GPU [28] by Wang et al These frameworks provide a unified MPI data transmission interface for both host and GPU memories; in other words, the programmer can use either the CPU buffer or the GPU buffer directly as the communication parameter in MPI routines. The goal of such GPU-integrated MPI platforms is to decouple the complex, low-level, GPU-specific data movement optimizations from the application logic, thus providing the following benefits: (1) portability: the application can be more portable across multiple accelerator platforms; and (2) forward compatibility: with the same code, the application can automatically achieve performance improvements from new GPU technologies (e.g., GPUDirect RDMA) if applicable and supported by the MPI implementation.…”
Section: Application Design Using Gpu-integrated Mpi Frameworkmentioning
confidence: 99%
See 2 more Smart Citations
“…To bridge the gap between the disjointed MPI and GPU programming models, researchers have recently developed GPU-integrated MPI solutions such as our MPI-ACC [6] framework and MVAPICH-GPU [28] by Wang et al These frameworks provide a unified MPI data transmission interface for both host and GPU memories; in other words, the programmer can use either the CPU buffer or the GPU buffer directly as the communication parameter in MPI routines. The goal of such GPU-integrated MPI platforms is to decouple the complex, low-level, GPU-specific data movement optimizations from the application logic, thus providing the following benefits: (1) portability: the application can be more portable across multiple accelerator platforms; and (2) forward compatibility: with the same code, the application can automatically achieve performance improvements from new GPU technologies (e.g., GPUDirect RDMA) if applicable and supported by the MPI implementation.…”
Section: Application Design Using Gpu-integrated Mpi Frameworkmentioning
confidence: 99%
“…The cudaMPI library studies providing wrapper API functions by mixing CUDA and MPI data movement [21]. Similarly to MPI-ACC, Wang et al propose to add CUDA [2] support to MVAPICH2 [22] and optimize the internode communication for InfiniBand networks [28]. All-to-all communication [27] and noncontiguous datatype communication [17,29] have also been studied in the context of GPUaware MPI.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In this simple example, the only purpose of the host buf buffer is to facilitate MPI communication of data stored in device memory. As the number of accelerators (and hence distinct memories) per node increases, manual data movement poses significant productivity problems [5].…”
Section: Challenges In Cuda+mpi Programmingmentioning
confidence: 99%
“…This work has focused on MPI point-to-point communication for internode GPU communication [5], all-to-all communication [14], and noncontiguous-type communication [15]. Similar work has also been proposed in the context of OpenMPI [16].…”
Section: Related Workmentioning
confidence: 99%