2009 International Conference on Parallel Processing 2009
DOI: 10.1109/icpp.2009.22
|View full text |Cite
|
Sign up to set email alerts
|

Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis

Abstract: The emergence of multicore processors raises the need to efficiently transfer large amounts of data between local processes. MPICH2 is a highly portable MPI implementation whose large-message communication schemes suffer from high CPU utilization and cache pollution because of the use of a double-buffering strategy, common to many MPI implementations. We introduce two strategies offering a kernel-assisted, single-copy model with support for noncontiguous and asynchronous transfers. The first one uses the now w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
58
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 50 publications
(58 citation statements)
references
References 12 publications
0
58
0
Order By: Relevance
“…It also pollutes the caches by evicting application data from it as the copy operation is being performed [8]. In the end, this strategy shows very interesting latency for small messages but it is not recommended for large messages [3], [4].…”
Section: B Traditional Double-copy Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…It also pollutes the caches by evicting application data from it as the copy operation is being performed [8]. In the end, this strategy shows very interesting latency for small messages but it is not recommended for large messages [3], [4].…”
Section: B Traditional Double-copy Implementationmentioning
confidence: 99%
“…Previous work [3], [4] introduced operating system assistance as a way to improve large message throughput. We present an in-depth study of this solution in the context of complex shared-memory machines.…”
Section: Introductionmentioning
confidence: 99%
“…This approach can improve performance for large-message transfers among processes that do not share a cache. A variety of standard and nonstandard methods for doing so are available on Unix [2]. Windows provides an OS service for directly accessing the address space of a specified process, provided the process has appropriate security privileges.…”
Section: Intranode Communicationmentioning
confidence: 99%
“…The LiMIC [7] kernel module can decrease the number of necessary memory copies to one by doing the memory movement with kernel access rights. KNEM [8] is a similar kernel module that also features DMA (Direct Memory Access) copy by using Intel I/O acceleration technique (I/OAT). DMA copy can decrease cache pollution and CPU noise from communication.…”
Section: Related Workmentioning
confidence: 99%