AbstractÐMinimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers. This paper presents a general technique which uses a global data-flow framework to optimize communication and synchronization in the context of the one-way communication model. In contrast to the conventional send/receive message-passing communication model, one-way communication is a new paradigm that decouples message transmission and synchronization. In parallel machines with appropriate low-level support, this may open up new opportunities not only to further optimize communication, but also to reduce the synchronization overhead. We present optimization techniques using our framework for eliminating redundant data communication and synchronization operations. Our approach works with the most general data alignments and distributions in languages like High Performance Fortran (HPF) and uses a combination of the traditional data-flow analysis and polyhedral algebra. Empirical results for several scientific benchmarks on a Cray T3E multiprocessor machine demonstrate that our approach is successful in reducing the number of data (communication) and synchronization messages, thereby reducing the overall execution times. memory machines, these techniques are readily applicable to uniform shared memory architectures as well. Gupta and Schonberg [19] show that compilers that generate code for one-way communication can exploit shared-memory architectures with flexible cache-coherence protocols (e.g., Wisconsin Typhoon [45] and Stanford FLASH [27]). To measure the benefits obtained from one-way communication (see Section 7), we used a Cray T3E [47] which is a logically shared, physically distributed memory multiprocessor that supports the PVM [15] and MPI [41] message-passing libraries, as well as a simple one-sided communication library provided by Silicon Graphics Inc.The Put primitiveÐexecuted by the producer of a dataÐtransfers the data from the producer's memory to the consumer's memory. This operation is very similar to the execution of a Send primitive by the producer and the execution of a matching Recv primitive by the consumer. There is an important difference, however: The consumer processor is not involved in the transfer directly and all the communication parameters are supplied by the producer [41]. As stated above, in order to ensure correctness, synchronization operations might be necessary. A large number of synchronization operations can be used to preserve the semantics of the program. These include barriers, point-to-point (or producer-consumer) synchronizations, and locks. The synchronization primitive used in this paper, namely SynchÐexecuted by the producer of a dataÐis a point-to-point communication primitive; however, our approach can be modified to work with other types of synchronizations as well. Note that both Stricker et al. [48] and Hayashi et al. [26] use barriers to implement synchronization. In contrast, our effort is aimed at reducing the tot...