The overhead of copying data through the central processor by a message passing protocol limits data transfer bandwidth. If the network intelface directly transfers the user's memory to the network by issuing DMA, such data copies may be eliminated. Since the DMA facility accesses the physical memory address space, user virtual memory must be pinned down to a physical memory location before the message is sent or received. If each message transfer involves pin-down and release kernel primitives, message transfer bandwidth will decrease since those primitives are quite expensive. We propose a zero copy message transfer with a pin-down cache technique which reuses the pinneddown area to decrease the number of calls to pin-down and release primitives. The proposed facility has been implemented in the PM low-level communication library on our RWC PC Cluster II, consisting of 64 Pentium Pro 200 MHz CPUs connected by a Myricom Myrinet network, and running NetBSD. The PM achieves 108.8 MByteshec for a 100 % pin-down cache hit ratio and 78.7MByteskec for all pindown cache miss. The MPI library has been implemented on top of PM. According to the NAS Parallel benchmarks result, an application is still better performance in case that cache miss ratio is very high.
This paper introduces a high performance communication middle layer, called PM2, for heterogeneous network environments. PM2 currently supports Myrinet, Ethernet, and SMP. Binary code written in PM2 or written in a communication library, such as MPICH-SCore on top of PM2, may run on any combination of those networks without re-compilation. According to a set of NAS parallel benchmark results, MPICH-SCore performance is better than dedicated communication libraries such as MPICH-BIP/SMP and MPICH-GM when running some benchmark programs.
Abstract. We have developed a new communication library, called PM, for the Myrinet gigabit LAN card, that has a dedicated processor and onboard memory to handle communication protocols. To obtain high performance communication and support multi-user environments, we have co-designed PM, an operating system implemented as a daemon process, and the run-time routine for a programming language. Several unique features, e.g., network context switching and a Modified ACK/NACK flow control algorithm, have been developed for PM. The PM library has been implemented on two types of clusters: Sun SPAP~Cstation model 20/71 workstations and Intel Pentium based PCs. PM on the Sun workstations has a round trip time of 20 #seconds for a user-level 8 byte message and a bandwidth of 38.6 Mbytes/second for an 8 Kbyte message. The result of a NAS parallel benchmark shows that a Sparc 20 workstation cluster achieves almost the same performance as a Cray T3D.
This paper designs an implementation of the MPI message passing interface using a zero copy message transfer primitive supported by a lower communication layer to realize a high performance communication library. The zero copy message transfer primitive requires a memory area pinned down to physical memory, which is a restricted quantity resource under a paging memory system. Allocation of pinned down memory by multiple simultaneous requests for sending and receiving without any control can cause deadlock. To avoid this deadlock, we have introduced: i) separate of control of send/receive pin-down memory areas to ensure that at least one send and receive may be processed concurrently, and ii) delayed queues to handle the postponed message passing operations which could not be pinned-down.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.