Performance analysis and prediction of parallel applications using the Message-Passing Interface (MPI) standard is a challenging task. Collecting, organizing, and making sense of profiling data for MPI jobs of even modest scale is difficult and timeconsuming. The task is further complicated by the inherent difficulty in interpreting the resulting communication measurements. In this paper we introduce MPInside, a new profiling and diagnostic tool that overcomes these constraints with carefully considered choices for measurement techniques, capabilities, and output formats. Using examples from real-world applications, we illustrate the innovative features of the toolincluding late senders for point-to-point calls and unaligned collective calls-all in an instrumentation-free framework. We also demonstrate the in-flight modeling capabilities of MPInside with several "what if" experiments.The MPInside project began as an investigation on parallel applications using the MPI standard. With a classical profiling tool, one measures the time an application spends in the user code versus the MPI library. Usually, when the computation time dominates, the application scales well. On the other hand, a large percentage of communication time typically indicates a poor parallel efficiency. One would naively believe that better communication hardware directly translates into reduced communication time and better parallel efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.