SUMMARYThe use of a massively parallel machine is aimed at the development of applications programs to solve most significant scientific, engineering, industrial and commercial problems.High-performance computing technology has emerged as a powerful and indispensable aid to scientific and engineering research, product and process development, and all aspects of manufacturing. Such computational power can be achieved only by massively parallel computws. It also requires a new and more effective mode of interaction between the computational sciences and applications and those parts of computer science concerned with the development of algorithms and software. We are interested in using parallel processing to handle large numerical tasks such as linear algebra problems. Yet, programming such systems has proven itself to be very complicated, error-prone and architecture-specific.One successful method for alleviating this problem, a method that worked well in the case of the massively pipelined supercomputers, is to use subprogram libraries. These libraries are built to efficiently perform some basic operations, while hiding low-level system specifics from the programmer. Efficiently porting a library to a new hardware, be it a vector machine or a shared memory or message passing based multiprocessor, is a major undertaking. It is a slow process that requires an intimate knowledge of the hardware features and optimization issues.We propose a scheme for the creation of portable implementations of such libraries. We present an implementation of BLAS (basic linear algebra subprograms), which is used as a standard linear algebra library. Our parallel implementation uses the virtual machine for multiprocessors (VMMP) (1990), which is a software package that provides a coherent set of services for explicitly parallel application programs running on diverse MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallell program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementating the same services on all target multiprocessors. Software created using this scheme is automatically efecient on both categories of MIMD machines, and on any hardware V M M P has been ported to.An additional level of abstraction is achieved using the programming language C*, an object-oriented language. Eckel, Stroustrup, 1989Stroustrup, ,1986. For the programmer who is using BLAS-.3, it is hiding both the data structures used to define linear algebra objects, and the parallel nature of the operations performed on these objects.We icoded BLAS on top of VMMP. This code was run without any modifications on two shared memory machines-the commercial Sequent Symmetry and the experimental Taunop. (The code should run on any machine the VMMP was ported onto, given the availability of a C* compiler). Performance results for thii implementation are given. The speed-up of the BLAS-3 routines, tested on 22 processors of the Sequent, was...