2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC) 2019
DOI: 10.1109/hipc.2019.00022
|View full text |Cite
|
Sign up to set email alerts
|

Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…In addition, there are tools built on top of existing NVIDIA libraries. For example, nvtop [6] is a wrapper around NVML that provides visualization for NVML metrics, and Moneo [24] is a monitoring system that specifically targets AI applications.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, there are tools built on top of existing NVIDIA libraries. For example, nvtop [6] is a wrapper around NVML that provides visualization for NVML metrics, and Moneo [24] is a monitoring system that specifically targets AI applications.…”
Section: Related Workmentioning
confidence: 99%
“…Orthogonal to Nsight Systems and Nsight Compute, there has also been efforts to build profiling tools using the NVIDIA CUDA Profiling Tools Interface (CUPTI API [26]) [12,20,24,31,38,39]. Furthermore, there are tools that have a stronger focus on profiling the data movement, such as [20], which is also built on top of CUPTI in addition to OSU INAM [33].…”
Section: Related Workmentioning
confidence: 99%
“…Welton and Miller [22] investigated hidden performance issues that impact several HPC applications but are not reported by tool APIs. Kousha et al [23] developed a tool for monitoring communications on multiple GPUs. Unlike the aforementioned tools, HPCToolkit collects call path profiles and shows calling context information in both trace and profile views.…”
Section: Related Workmentioning
confidence: 99%
“…Figure 1 presents an overview of INAM. INAM is capabale of profiling and monitoring largescale InfiniBand networks with low overhead [5], profiling GPU and CPU intra-node communication [4], and profiling both MPI and the job scheduler [6]. Therefore, INAM supports profiling HPC/DL applications, MPI communication runtimes, and the job scheduler.…”
Section: Introductionmentioning
confidence: 99%