2010
DOI: 10.1007/s00450-010-0115-3
|View full text |Cite
|
Sign up to set email alerts
|

Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems

Abstract: The increasing popularity of multi-core processors has made MPI intra-node communication, including the intra-node RMA (Remote Memory Access) communication, a critical component in high performance computing. MPI-2 RMA model includes one-sided data transfer and synchronization operations. Existing designs in popularly used MPI stacks do not provide truly one-sided intranode RMA communication. They are built on top of twosided send-receive operations, therefore suffering from overheads of two-sided communicatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(24 citation statements)
references
References 14 publications
0
24
0
Order By: Relevance
“…Lai et al present an OSC implementation which makes use of kernel and hardware facilities to accelerate the interprocess message transfer [14]. In a follow-up work, the authors designed an OSC implementation for conventional shared memory systems which provide hardware cache coherence [20].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Lai et al present an OSC implementation which makes use of kernel and hardware facilities to accelerate the interprocess message transfer [14]. In a follow-up work, the authors designed an OSC implementation for conventional shared memory systems which provide hardware cache coherence [20].…”
Section: Related Workmentioning
confidence: 99%
“…Since PSCW is more appropriate for the regarded FFT benchmark, a PSCW synchronization scheme based on bit vectors like proposed in [14] was tuned for the SCC. A detailed discussion is out of the scope of this paper.…”
Section: Synchronizationmentioning
confidence: 99%
“…Additionally, MPI researchers conduct long-term optimization works on the MPI RMA [16][17][18][19] and collective operations [20][21][22]. Those optimizations are also liable to be applied onto our DART and then benefit the performance of applications.…”
Section: Related Workmentioning
confidence: 99%
“…Since MPI was first implemented in 1992, it has been implemented and optimized on different computing environments, e.g., multicore processors [18], [20], [11], [5], wide area network [17], and Infiniband networks [13], [28]. Our idea of grouping is partially inspired by the grouping algorithms in those previous studies.…”
Section: Mpimentioning
confidence: 99%