Novel views of performance data to analyze large-scale adaptive applications

Bhatelé, Abhinav; Gamblin, Todd; Isaacs, Katherine E.; Gunney, Brian T. N.; Schulz, Martin; Bremer, Peer-Timo; Hamann, Bernd

doi:10.1109/sc.2012.80

Cited by 8 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This method is restrictive since it prevents any other application from running on the system during the experiments. Furthermore, this and other Boxfish related works, such as [25] and [7], dealt with only tori topologies and not fat-tree or any other network topology. Our approach records application-specific performance metrics within the MPI library and can be use on shared nodes and shared networks.…”

Section: Related Workmentioning

confidence: 99%

Hardware-Centric Analysis of Network Performance for MPI Applications

Brown

Domke

Matsuoka

2015

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

View full text Add to dashboard Cite

As the scale of high-performance computing systems increases, optimizing inter-process communication becomes more challenging while being critical for ensuring good performance. However, the hardware layer abstraction provided by MPI makes it difficult to study application communication performance over the network hardware, especially for collective operations. We present a new approach to network performance analysis based on exposing low-level communication metrics in a flexible manner and conducting hardware-centric analysis of these metrics. We show how low-level network metrics can be revealed using Open MPI's Peruse utility, without interfacing with the hardware layer. A lightweight profiler, ibprof, was developed to aggregate these metrics from message passing events at a cost of <1% runtime overhead for communication in NPB kernel and application benchmarks. We also developed a flexible visualization module for the Boxfish analysis tool to analyze our communication profile over the physical topology of the network. Using case studies, we demonstrate how our approach can identify communication anomalies in network applications and guide performance optimization strategies.

show abstract

Section: Related Workmentioning

confidence: 99%

Hardware-Centric Analysis of Network Performance for MPI Applications

Brown

Domke

Matsuoka

2015

2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)

View full text Add to dashboard Cite

show abstract

“…They achieved a considerable improvement on a Cray XK6 system. Bhatele et al [9] used a binary tree to visualize communication topology which facilitates a diagnosis of communication and workload distribution. They showed the performance improvement for a large AMR application on an IBM Blue Gene/P system.…”

Section: Related Workmentioning

confidence: 99%

CommGram: A New Visual Analytics Tool for Large Communication Trace Data

Zeng

et al. 2014

2014 First Workshop on Visual Performance Analysis

View full text Add to dashboard Cite

Abstract-The performance of massively parallel program is often impacted by the cost of communication across computing nodes. Analysis of communication patterns is critical for understanding and optimizing massively parallel programs. Visualization can help identify potential communication bottlenecks by displaying message trace data. However, the visual clutter and temporal incoherence problems are typically incurred in existing visualization tools for a considerable number of processors. In this paper, we present a new tool, named CommGram, which supports visual analysis of communication patterns for massive parallel MPI programs. With the benefit of MPI trace library DUMPI of SST, our framework builds hierarchical clustering trees for computational community domain, and takes advantage of graphical user interface (GUI) to convey communication patterns at different levels of detail. The effectiveness of our tool is demonstrated using large-scale parallel applications.

show abstract

“…Boxfish's 2D torus view has been used to better understand network behavior [6], [7] in pF3D [8], [9], a multi-physics laser-plasma interaction simulation. The view showed the differences in traffic load in the various torus directions given various node mappings.…”

Section: Boxfish In Practicementioning

confidence: 99%

Abstract: Exploring Performance Data with Boxfish

Isaacs

Landge

Gamblin

et al. 2012

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Self Cite

View full text Add to dashboard Cite

The growth in size and complexity of scaling applications and the systems on which they run pose challenges in analyzing and improving their overall performance. With metrics coming from thousands or millions of processes, visualization techniques are necessary to make sense of the increasing amount of data. To aid the process of exploration and understanding, we announce the initial release of Boxfish, an extensible tool for manipulating and visualizing data pertaining to application behavior. Combining and visually presenting data and knowledge from multiple domains, such as the application's communication patterns and the hardware's network configuration and routing policies, can yield the insight necessary to discover the underlying causes of observed behavior. Boxfish allows users to query, filter and project data across these domains to create interactive, linked visualizations. I. PROJECTING DATA ACROSS DOMAINSWe describe the association of elements that exist in one domain with the elements of another as a projection. A map file which associates integer MPI ranks with coordinate-denoted hardware nodes and threads is an example of a commonly used projection. Schulz et al.[1] advocated the use of projections in interpreting performance data and defined three domains of interest -hardware, application and communication. The hardware domain includes performance counters. The application domain includes information relating to the application, such as physics measurements in a simulation or matrix properties in a linear algebra library. The communication domain includes messages sent among subsets of processors. Boxfish recognizes these domains by default, but contributed modules may add others.Boxfish is designed to support the projection of data across domains. When filters or queries are written requiring attributes from multiple domains, or when a view requires attribute information in a native domain, Boxfish searches its available projections to make the necessary transformations. This allows users to view data such as the load on nodes which had a certain range of values in a previous run or the average wait time for communicators in a particular phase of the application. Data tables may have default preferred projections. Projections can be added from files, created based on data Fig. 1. A 3D torus network represented in 2D (left) and 3D (right). Both views represent elements of the hardware domain. However, nodes are colored by their sub-communicators, which belong to the communication domain.Links are colored by the number of packets sent over them. These views are rendered side by side in Boxfish, indicating they are siblings in the filter hierarchy and show the same data. In the 2D view, selected nodes are displayed at a slightly larger size. In the 3D view, the same nodes are selected and highlighted by their relative opacity. attributes, or composed from existing ones. More projections may be added through future or contributed modules. Figure 1 shows a projection from the communication domain on...

show abstract

Novel views of performance data to analyze large-scale adaptive applications

Cited by 8 publications

References 15 publications

Hardware-Centric Analysis of Network Performance for MPI Applications

Hardware-Centric Analysis of Network Performance for MPI Applications

CommGram: A New Visual Analytics Tool for Large Communication Trace Data

Abstract: Exploring Performance Data with Boxfish

Contact Info

Product

Resources

About