2010 18th IEEE Symposium on High Performance Interconnects 2010
DOI: 10.1109/hoti.2010.22
|View full text |Cite
|
Sign up to set email alerts
|

Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine

Abstract: Collective communication operations provided by TheMessage Passing Interface (MPI) are heavily used by scientific applications at large scale. The current MPI standard, MPI-2.2, only defines blocking collective communication calls and it is expected that MPI-3 will allow for non-blocking collective communication. While it is possible to allow simultaneous computation and communication through thread-based designs, resource sharing across the threads is always a concern. The newly introduced ConnectX-2 InfiniBa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
3
2
2

Relationship

3
4

Authors

Journals

citations
Cited by 19 publications
(16 citation statements)
references
References 15 publications
(13 reference statements)
0
16
0
Order By: Relevance
“…To the best of our knowledge no one has attempted to show that the primitives offered by CORE-Direct are powerful enough to offload any communication schedule, or shown its limits. In [28] the authors define "building-blocks" for collectives, patterns such as 1-to-n send, or receive-and-replicate. With cDAG we provide a more fine-grained and complete abstraction than those patterns.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…To the best of our knowledge no one has attempted to show that the primitives offered by CORE-Direct are powerful enough to offload any communication schedule, or shown its limits. In [28] the authors define "building-blocks" for collectives, patterns such as 1-to-n send, or receive-and-replicate. With cDAG we provide a more fine-grained and complete abstraction than those patterns.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…al. proposed communication primitives for blocking collective operations with the CORE-Direct in [6]. In [10], we designed a scalable network offload based MPI Ialltoall implementation and demonstrated up to 23% improvement with a parallel 3D FFT library.…”
Section: Impact Of System Noise Of Pcg Run-timesmentioning
confidence: 99%
“…Mellanox recently introduced network offload features in their ConnectX-2 [5] adapter. Using this feature, generic lists of communication tasks can be offloaded to the network interface [6]. Such an interface eliminates the need for the host processor to progress communication and provides a low-level mechanism which can be leveraged to design non-blocking collective communication algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…In [4], a set of primitives that can be used to design collective operations to leverage the network offload feature were proposed. However, neither of these have designed scalable, non-blocking versions for data-moving collective operations.…”
Section: Designing Non-blocking Algorithms With Collective Offloadmentioning
confidence: 99%
“…In [3,4,12], researchers have explored various facets of this interface. Using this feature, generic lists of communication tasks can be offloaded to the network interface.…”
Section: Introductionmentioning
confidence: 99%