2007 IEEE International Conference on Cluster Computing 2007
DOI: 10.1109/clustr.2007.4629228
|View full text |Cite
|
Sign up to set email alerts
|

Efficient asynchronous memory copy operations on multi-core systems and I/OAT

Abstract: Bulk memory copies incur large overheads such as CPU stalling (i.e., no overlap of computation with memory copy operation), small register-size data movement, cache pollution, etc. Asynchronous copy engines introduced by Intel's I/O Acceleration Technology help in alleviating these overheads by offloading the memory copy operations using several DMA channels. However, the startup overheads associated with these copy engines such as pinning the application buffers, posting the descriptors and checking for compl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2008
2008
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 7 publications
0
13
0
Order By: Relevance
“…Userlevel memory copy offload with the I/OAT DMA Engine has been studied in a single application [16]. Its comparison with offloading a regular memcpy in a thread revealed the same conclusion as ours: I/OAT becomes interesting for megabyte and larger messages [17].…”
Section: Related Workmentioning
confidence: 56%
“…Userlevel memory copy offload with the I/OAT DMA Engine has been studied in a single application [16]. Its comparison with offloading a regular memcpy in a thread revealed the same conclusion as ours: I/OAT becomes interesting for megabyte and larger messages [17].…”
Section: Related Workmentioning
confidence: 56%
“…In the context of highperformance computing, I/OAT improves inter-process communication such as shared-memory MPI implementations on multicore nodes [21]. Our I/OAT based local communication model is very similar but has the advantage of being transparently integrated into the OPEN-MX stack since the driver automatically switches from regular to local communication without needing any specific support in user-space.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Indeed, OPEN-MX local communication is based on a system call where a direct copy is performed between the source process address space into the target. A comparable model has been presented in [21] as an extension to the MVAPICH MPI middleware. This design is actually nicely integrated into the OPEN-MX stack since all communications, either local or through the network, are managed by the driver through the same commands, and they return the same events to the userspace library.…”
Section: Offloading Synchronous Copiesmentioning
confidence: 99%
“…Many works have recently considered the more general issue of copying memory regions in multicore systems using specific hardware [41,90], or how the memory management can play a significant role in the communication performance [40,84]. However, the interactions between simultaneously transferring the data to the Network Interface Card and obtaining an additional copy in the application space has not been addressed.…”
Section: Optimizing Sender-based Message Loggingmentioning
confidence: 98%