2010 IEEE International Symposium on Parallel &Amp; Distributed Processing, Workshops and PHD Forum (IPDPSW) 2010
DOI: 10.1109/ipdpsw.2010.5470849
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing MPI communication within large multicore nodes with kernel assistance

Abstract: Abstract-As the number of cores per node increases in modern clusters, intra-node communication efficiency becomes critical to application performance. We present a study of the traditional double-copy model in MPICH2 and a kernelassisted single-copy strategy with KNEM on different sharedmemory hosts with up to 96 cores.We show that KNEM suffers less from process placement on these complex architectures. It improves throughput up to a factor of 2 for large messages for both point-to-point and collective operat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 15 publications
(13 citation statements)
references
References 16 publications
0
13
0
Order By: Relevance
“…This approach has already proved to be valuable to increase point-to-point bandwidth between processes communicating over shared memory [5]. However, beyond the free performance upgrade offered by using more efficient point-to-point operations [14], hierarchy aware MPI collective components need more control over the underlying memory copy mechanism to reach their full potential. In this paper, we present new collective algorithms that take into account the specificities of kernel assisted memory copies, and require a new feature compared to state of the art kernel assisted copies: directional control of transfers.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This approach has already proved to be valuable to increase point-to-point bandwidth between processes communicating over shared memory [5]. However, beyond the free performance upgrade offered by using more efficient point-to-point operations [14], hierarchy aware MPI collective components need more control over the underlying memory copy mechanism to reach their full potential. In this paper, we present new collective algorithms that take into account the specificities of kernel assisted memory copies, and require a new feature compared to state of the art kernel assisted copies: directional control of transfers.…”
Section: Related Workmentioning
confidence: 99%
“…While the beneficial effect of KNEM on point-topoint performance also translates into collective improvements [14], we have identified a series of additional optimizations that further boost collective communication performance. They require that the collective component has more control over the movement of data (vs. simply using MPI point-to-point primitives): (1 Because control of the kernel module is delegated to the point-to-point MPI message passing engine, using inter-process kernel-assist memory copies results in the same data region being registered multiple times when sent to different destination processes.…”
Section: A Issues With Mpi Collective Operationsmentioning
confidence: 99%
“…KNEM collective [14] is implemented as an Open MPI's intra-node collective component, which is directly based on kernel single-copy module: KNEM. The details about KNEM copy can be found in the papers [15], [16]. KNEM-based collectives mainly accelerate large messages' collective communication, and not small messages, because trapping into the kernel and distributing cookies introduce an overhead (which is equivalent to a 16KB broadcast or a 2KB Allgather on the platforms described above) [14].…”
Section: A Distance Between Processesmentioning
confidence: 99%
“…The KNEM assisted approach outperforms the standard transfer method in the MPICH2 implementation when no cache is shared between the processing cores, or when very large messages are being transferred. Even simply using KNEM assisted point-to-point communication underneath collective communication achieved a significant improvement [1,2]. Within Open MPI, a similar approach was implemented, with further emphasis on auto-tuning and performance portability [5].…”
Section: Related Workmentioning
confidence: 99%
“…MPICH2, since version 1.1.1, uses KNEM in the DMA LMT to improve large message performance within a single node. The work in [1,2] has shown that KNEM-enabled MPI communication significantly improves the performance of some micro-and macro-benchmarks. However, the performance of real scientific applications using KNEM-enabled MPI communication has yet to be asserted.…”
Section: Introductionmentioning
confidence: 99%