Message passing is a very popular style of parallel programming, used in a wide variety of applications and supported by many APIs, such as BSD sockets, MPI and PVM. Its importance has motivated significant amounts of research on optimization and debugging techniques for such applications. Although this work has produced impressive results, it has also failed to fulfill its full potential. The reason is that while prior work has focused on runtime techniques, there has been very little work on compiler analyses that understand the properties of parallel message passing applications and use this information to improve application performance and quality of debuggers.This paper presents a novel compiler analysis framework that extends dataflow to parallel message passing applications on arbitrary numbers of processes. It works on an extended controlflow graph that includes all possible inter-process interactions of any numbers of processes. This enables dataflow analyses built on top of this framework to incorporate information about the application's parallel behavior and communication topology. The parallel dataflow framework can be instantiated with a variety of specific dataflow analyses as well as abstractions that can tune the accuracy of communication topology detection against its cost.The proposed framework bridges the gap between prior work on parallel runtime systems and sequential dataflow analyses, enabling new transformations, runtime optimizations and bug detection tools that require knowledge of the application's communication topology. We instantiate this framework with two different symbolic analyses and show how these analyses can detect different types of communication patterns, which enables the use of dataflow analyses on a wide variety of real applications.
I. IntroductionThe rise of cluster and multi-core computing has driven a revolution in computer hardware and software. As these machines are best utilized by parallel applications that are explicitly written to take advantage of multiple processing cores, their popularity is pushing the development of a wide variety of parallel applications. In particular, message passing applications, where processes use distributed memory and communicate via explicit send and receive operations, have become very popular. These applications are common on a wide variety of platforms and use popular APIs such as BSD sockets, MPI and PVM.The past few decades have seen significant amounts of work on message passing optimization and applications. The bulk of it has focused on runtime and techniques that either improve the underlying hardware (e.g. network processors [12]), the software infrastructure (e.g. self-tuning MPI [6]) or use run-time information to analyze and tune application performance [11]. In contrast, work on static techniques to analyze and optimize such applications has been mostly limited to sequential analyses that either improve the performance of the sequential portions of these applications or improve parallel performance by looking at th...