This report is the PICL user's guide, lt contains an overview of PICL and how it is used. Examples in C and Fortran are included. PICL is a subroutine library that can be used to develop parallel programs that are portable across several distributed-memory multiprocessors. P1CL provides a portable syntax for key communication primitives and related system calls. It also provides portable routines to perform certain widely-used, high-level communication operations, such as global broadcast and global summation. Finally, PICL provides execution tracing that can be used to monitor performance or to aid in debugging.-V-This document contains examples and information needed for straightibrward use of most of PICL's basic features. Full documentation of all PICL options and the various ways the library can be used is contained in a separate report [1]. The library is made up of three distinct sets of routines: a set of low-level communication and system primitives described in section 2, a set of high-level global communication routines whose use is described in section 3, and a set of routines for ,, invoking and controlling the execution tracing facility, whi(:h is described iii s(,(:tiozl 4. Each section __ntains examl)les in C sllowing typical uses oi' the resl)ectiw,_ routines. IIi. addition, tile Appendix contains FORTRAN versions of tile examples and instructions for obtaining PICL and ParaGraph. 2. Low-Level Routines The 12 low-level communication and system interface routines, described ill Table 1, provide a portable syntax for message-passing programs. Tile PICL programming model assumes that the multiI)rocessor can send messagt:s between arbitrarily chosen pairs of processors. The time required to send a message between two processors is a function of the interprocessor communication network, a.nd user will need to be aware of such machine dependencies in order to write efficient programs. Our model distinguishes one processor, the host, from the rest. The user has access to the remaining processors, called node processors (or simply nodes), through the host. Typically, an application code consists of one program that runs on the host ,, and another program that run,,',,on each of the nodes. The host program calls PICL routines to allocate node processors, load the node progi'am (or l_r0gra.ms) onto the nodes, send input data required by the node programs, a.nd receive results fronl thf, nodes.