Scalable Comparative Visualization of Ensembles of Call Graphs

Kesavan, Suraj P.; Bhatia, Harsh; Bhatele, Abhinav; Brink, Stephanie; Pearce, Olga; Gamblin, Todd; Bremer, Peer-Timo; Ma, Kwan‐Liu

doi:10.1109/tvcg.2021.3129414

Cited by 6 publications

(3 citation statements)

References 70 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analyzing individual process behavior and performance variability for various program entities is essential and required by performance visualization systems (e.g., Callflow [15]). In fact, the collected data includes massive local (within each process) and global (across processes) redundancies.…”

Section: Related Workmentioning

confidence: 99%

“…The profiles also provide detailed data for each process, allowing for an in-depth examination of processspecific behavior. TinyProf produces output files that can be easily converted to formats compatible with widely-used analysis and visualization tools, such as CallFlow [3], [15] and Hatchet [16], [17], through the use of a straightforward conversion script.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/O

Fan,

Kesavan,

Petruzza

et al. 2024

ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

View full text Add to dashboard Cite

Performance profiling tools are crucial for HPC specialists to identify performance bottlenecks in parallel codes at various levels of granularity (i.e., across nodes, ranks, and threads). Although numerous sophisticated profiling tools have been developed, achieving scalable performance introspection on large scales remains a challenge. This is particularly evident in efficiently writing profiles to disk during runtime and subsequently reading them with constrained computing resources for posthoc analysis. In this paper, we present TinyProf , a performance introspection framework that tackles I/O-related challenges in profiling performance data at scale. TinyProf 's scalability is attributed to an optimal runtime that consists of three key components: (1) an efficient in-memory data structure that minimizes memory consumption and decreases communication overhead during parallel file I/O; (2) a customizable threephase I/O scheme that generates optimal I/O patterns capable of scaling with high core counts; and (3) a streamlined data format for profiles, which guarantees minimal sizes for profile files. These three techniques instill scalability into the profiler, making it low overhead, even at high process counts (less than 5%). This low overhead makes it possible for the profiler to be run with an application as a default (whenever the application is running)-enabling continuous introspection of performance. We demonstrate the efficiency of our framework using large-scale parallel applications and perform a thorough evaluation against existing systems up to 32k processes.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

TinyProf: Towards Continuous Performance Introspection through Scalable Parallel I/O

Fan,

Kesavan,

Petruzza

et al. 2024

ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

View full text Add to dashboard Cite

show abstract

“…Xie et al [50] prioritizes locating anomalous CCTs in execution traces through an embedding approach. CallFlow [27,36] prioritizes showing the distribution of attributes and aggregates call sites by their module, using a Sankey diagram with embedded histograms and other distribution indicators as well as linked statistical charts to represent the resulting structure and encode performance metrics. Comparison across per-CPU trees was not a goal discussed in our interviews with target users, and thus we focus on tasks with preaggregated per-program trees.…”

Section: Visualizing Calling Context Treesmentioning

confidence: 99%