Executive SummaryIn software running on distributed computing clusters, time spent on communication between nodes in the cluster can be a significant portion of the overall computation time; background operating system tasks and other computational "noise" on the nodes of the system can have a significant impact on the amount of time this communication takes, especially on large systems. The research completed in this period has improved understanding of when such noise will have a significant impact. Specifically, it was demonstrated that not just noise on the nodes, but also noise on the network between nodes can have a significant impact on computation time [3]. It was also demonstrated that noise patterns matter more than noise intensity: very regular noise can cause less disruption than lighter (on average) but less regular noise [9]. It was also demonstrated that the effect of noise is more prominent as the speed of the network between nodes is increased [9]. Furthermore, a tracing tool, Netgauge, was improved via our work, and a system simulator, LogGOPSim, was developed; they can be used by application developers to improve performance of their program and by system designers to mitigate the effects of noise by adjusting the noise characteristics of the operating system. Both have been made freely available as open source programs. In the course of developing these tools, we demonstrated weaknesses in existing methodologies for modeling communication [1], and we introduced a more detailed model, LogGOPS, for simulating systems.Not only were the deleterious effects of noise explored but we have also offered solutions. Our studies of simulations of system noise have led to specific recommendations on tuning systems to mitigate noise. We have also improved existing approaches to mitigating noise. "Non-blocking collective communication" avoids the effects of noise by letting communication continue simultaneously with computation (thus being "non-blocking"), so that the delays in communication introduced by noise have a smaller impact on overall computation time. Potentially, noise can be reduced much further by "offloading" communication tasks to a separate processing element than the operating system is using.We have improved our library LibNBC, which provides an implementation of non-blocking collectives, via this work. During this research, our proposal to include non-blocking collectives (which used LibNBC as a reference implementation) in the upcoming MPI-3 standard was accepted. As MPI is a ubiquitous and important standard for communication in parallel computing, this demonstrates a certain acceptance of the practicality and desirability of non-blocking collectives. Now that non-blocking collectives are a part of the standard we can expect to see optimized platform-specific implementations of non-blocking collectives. Also as part of this work we have also developed a language GOAL (Global Operation Assembly Language) that can be used as a starting point for defining languages to express optimized com...