Performance of a High-Level Parallel Language on a High-Speed Network

Bal, Henri E.; Bhoedjang, Raoul; Hofman, Rutger F. H.; Jacobs, Ceriel J. H.; Langendoen, Koen; Rühl, Tim; Verstoep, Kees

doi:10.1006/jpdc.1996.1265

Cited by 14 publications

(7 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, a true solution would require the host and LANai extension to maintain the same view of the new window coordinates before the window can be moved. 6 Conclusions…”

Section: Discussionmentioning

confidence: 98%

“…The ORCA project demonstrated that applicationspecific multicast protocols perform better when handled directly from the network interface [6]. Lazy Receive Processing [2] requires operating system aware support directly on the network interface in order to processes packets fairly on a per socket basis.…”

Section: Other Spine Usesmentioning

confidence: 99%

“…° Host/Device protocol partitioning. Low-level protocol support for application-specific multicast [6], packet filtering (e.g., DPF [12]) and quality of service (e.g., Lazy Receive Processing [2]) has shown to significantly improve system performance. • Device-level memory management.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Spine

Fiuczynski

Martin

Owa

et al. 1998

Proceedings of the 8th ACM SIGOPS European Workshop on Support for Composing Distributed Applications

View full text Add to dashboard Cite

The emergence of fast, cheap embedded processors present s the opportunity to execute code directly on the network interface. We are developing an extensible execution environment, called SPINE, that enables applications to compute directly on the network interface This structure allows network-oriented applications to communicate with other applications executing on the host CPU, peer devices, and remote nodes with low latency and high efficiency.

show abstract

“…However, a true solution would require the host and LANai extension to maintain the same view of the new window coordinates before the window can be moved. 6 Conclusions…”

Section: Discussionmentioning

confidence: 98%

Section: Other Spine Usesmentioning

confidence: 99%

See 1 more Smart Citation

Spine

Fiuczynski

Martin

Owa

et al. 1998

Proceedings of the 8th ACM SIGOPS European Workshop on Support for Composing Distributed Applications

View full text Add to dashboard Cite

show abstract

“…35 It is implemented on top of Panda, a communication library that provides RPC and totally ordered multicast at user-level. Panda has been recently ported to run on a network of eight 50 MHz Sparc workstations using a modified version of Fast Messages (in order to support software interrupt and multicast) over Myrinet.…”

Section: Orca and Fast Messagesmentioning

confidence: 99%

MRPC: A high performance RPC system for MPMD parallel computing

Chang

Czajkowski

Eicken

1999

Softw: Pract. Exper.

View full text Add to dashboard Cite

MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high-performance multi-computers, limiting the appeal of RPC-based languages in the parallel computing community. MRPC combines the efficient control and data transfer provided by Active Messages (AM) with a minimal multithreaded runtime system that extends AM with the features required to support MPMD. This approach introduces only the necessary RPC overheads for an MPMD environment. MRPC has been integrated into Compositional C++ (CC++), a parallel extension of C++ that offers an MPMD programming model. Basic performance in MRPC is within a factor of two from those of Split-C, a highly tuned SPMD language, and other messaging layers. CC++ applications perform within a factor of two to six from comparable Split-C versions, which represent an order of magnitude improvement over previous CC++ implementations. Copyright INTRODUCTIONRemote Procedure Call (RPC) 1 is widely used in distributed systems as the primary communication abstraction. In its most general form, an RPC specifies the data that is to be transferred and the remote operation that is to be performed with the data. Using a simple procedure call abstraction, the RPC initiator calls into a local stub, which marshals and transfers the data to the remote address space through a standard communication channel (e.g. pipes, streams, or sockets). A remote stub unmarshals the data and transfers control to a new thread that will execute the specified operation to assimilate the data. The result of the operation is sent back to the caller's address space through stubs, which then resumes computation. A RPC system typically consists of an IDL compiler for stub generation and a runtime system that interfaces with the operating system to perform data and control transfer.Over the last decade, RPC has been extensively studied and optimized in operating systems. The focus gradually moved from the original inter-machine RPC 2 to local RPC 3-6 in which the role of the kernel is minimized during cross-domain calls on uniprocessor and shared-memory multiprocessor machines. The performance of RPC * It is tempting to simply send the hash value to the callee instead of the entire method name. This doesn't work because of possible collisions in the hash table.

show abstract

“…When the buffer space is full, the packets may be lost if the software protocol does not implement a strategy to avoid it. The solutions proposed for this problem can be classified as solutions that allow recovery from buffer overflow and solutions that prevent buffer overflow [2,4,9]. In CLIC, flow control is implemented by a credit scheme, but it attempts to avoid blocking in the sender when there is still free space in the buffers.…”

Section: Clic and Previous Related Workmentioning

confidence: 99%

An efficient OS support for communication on Linux clusters

Díaz

Ortega²,

Fernández³

et al.

Proceedings International Conference on Parallel Processing Workshops

View full text Add to dashboard Cite

A communication layer is proposed that, besides improving communication performance on clusters of PCs, by reducing the latencies and increasing the bandwidth figures even f o r short messages, also meets other requirements such as multiprogramming, portability, protection against corrupted programs, reliable message delivery, direct access to the network for all applications, etc. Instead of removing the operating system kernel from the critical path and creating a user-level network interface, our aim was to optimize the operating system support to provide reliable and eficient network software, avoiding the TCPLIP protocol stack. The communication system was tested in a cluster of PCs with Linux OS and interconnected with Fast Ethernet. The performance figures obtained define the best situation that can be attained without modifying the device drivers or using a user-level network intelface approach.

show abstract

Performance of a High-Level Parallel Language on a High-Speed Network

Cited by 14 publications

References 38 publications

Spine

Spine

MRPC: A high performance RPC system for MPMD parallel computing

An efficient OS support for communication on Linux clusters

Contact Info

Product

Resources

About