2008 37th International Conference on Parallel Processing 2008
DOI: 10.1109/icpp.2008.16
|View full text |Cite
|
Sign up to set email alerts
|

Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(19 citation statements)
references
References 9 publications
0
19
0
Order By: Relevance
“…For this reason, all IMB tests in the rest of the article will be presented with offcache enabled, assuming it better represents the performance that real applications may expect. While this methodology shows lower throughput than [3], [4], it fortunately brings comparable behaviors, especially regarding the threshold that determines when to switch from NEMESIS to KNEM: KNEM becomes interesting once the message size passes 16 KiB. It is also worth noticing here that I/OAT copy offload brings interesting performance improvements (up to 80%) as soon as KNEM is used.…”
Section: B Impact Of Cache Sharingmentioning
confidence: 95%
See 3 more Smart Citations
“…For this reason, all IMB tests in the rest of the article will be presented with offcache enabled, assuming it better represents the performance that real applications may expect. While this methodology shows lower throughput than [3], [4], it fortunately brings comparable behaviors, especially regarding the threshold that determines when to switch from NEMESIS to KNEM: KNEM becomes interesting once the message size passes 16 KiB. It is also worth noticing here that I/OAT copy offload brings interesting performance improvements (up to 80%) as soon as KNEM is used.…”
Section: B Impact Of Cache Sharingmentioning
confidence: 95%
“…It also pollutes the caches by evicting application data from it as the copy operation is being performed [8]. In the end, this strategy shows very interesting latency for small messages but it is not recommended for large messages [3], [4].…”
Section: B Traditional Double-copy Implementationmentioning
confidence: 99%
See 2 more Smart Citations
“…However, it does not support I/OAT copy offload, vectorial buffers, or asynchronous data transfer. It has been used within MVAPICH2 with configurable thresholds for switching from the usual two-copies to the kernel-based, single-copy model [7]. However, it does not provide any automatic threshold, whereas our KNEM LMT dynamically computes its thresholds depending on the hardware characteristics.…”
Section: Related Workmentioning
confidence: 99%