MRPC is an RPC system that is designed and optimized for MPMD parallel computing. Existing systems based on standard RPC incur an unnecessarily high cost when used on high-performance multi-computers, limiting the appeal of RPC-based languages in the parallel computing community. MRPC combines the efficient control and data transfer provided by Active Messages (AM) with a minimal multithreaded runtime system that extends AM with the features required to support MPMD. This approach introduces only the necessary RPC overheads for an MPMD environment.
MRPC has been integrated into Compositional C++ (CC++), a parallel extension of C++ that offers an MPMD programming model. Basic performance in MRPC is within a factor of two from those of Split-C, a highly tuned SPMD language, and other messaging layers. CC++ applications perform within a factor of two to six from comparable Split-C versions, which represent an order of magnitude improvement over previous CC++ implementations. Copyright
INTRODUCTIONRemote Procedure Call (RPC) 1 is widely used in distributed systems as the primary communication abstraction. In its most general form, an RPC specifies the data that is to be transferred and the remote operation that is to be performed with the data. Using a simple procedure call abstraction, the RPC initiator calls into a local stub, which marshals and transfers the data to the remote address space through a standard communication channel (e.g. pipes, streams, or sockets). A remote stub unmarshals the data and transfers control to a new thread that will execute the specified operation to assimilate the data. The result of the operation is sent back to the caller's address space through stubs, which then resumes computation. A RPC system typically consists of an IDL compiler for stub generation and a runtime system that interfaces with the operating system to perform data and control transfer.Over the last decade, RPC has been extensively studied and optimized in operating systems. The focus gradually moved from the original inter-machine RPC 2 to local RPC 3-6 in which the role of the kernel is minimized during cross-domain calls on uniprocessor and shared-memory multiprocessor machines. The performance of RPC * It is tempting to simply send the hash value to the callee instead of the entire method name. This doesn't work because of possible collisions in the hash table.