SC20: International Conference for High Performance Computing, Networking, Storage and Analysis 2020
DOI: 10.1109/sc41405.2020.00093
|View full text |Cite
|
Sign up to set email alerts
|

GVPROF: A Value Profiler for GPU-Based Clusters

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 17 publications
(9 citation statements)
references
References 35 publications
0
9
0
Order By: Relevance
“…Diogenes [50] overloads GPU memory copy APIs to analyze duplicate values copied to the GPU but it does not analyze patterns of value use by GPU kernels. The most related approach is GVProf [58], a value profiler for NVIDIA GPUs. While GVProf can identify value redundancies, it does not systematically categorize value patterns and cannot identify as many inefficiencies as ValueExpert can, as we describe in Section 7.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Diogenes [50] overloads GPU memory copy APIs to analyze duplicate values copied to the GPU but it does not analyze patterns of value use by GPU kernels. The most related approach is GVProf [58], a value profiler for NVIDIA GPUs. While GVProf can identify value redundancies, it does not systematically categorize value patterns and cannot identify as many inefficiencies as ValueExpert can, as we describe in Section 7.…”
Section: Related Workmentioning
confidence: 99%
“…ValueExpert's offline analyzer adopts a bidirectional slicing algorithm [58] that derives a GPU memory instruction's access 1 Based on our experiments, we use a threshold of 33%. type based on instructions with known access types on its def-use chains.…”
Section: Value Pattern Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, Goroshov et al [27] use instrumentation to measure basic block latency and detect hot code regions. GVProf [28] instruments GPU memory instructions to profile value redundancies. In HPCToolkit, we use GT-Pin to measure instruction counts within GPU kernels.…”
Section: Related Workmentioning
confidence: 99%
“…Using binary instrumentation to collect instruction traces, we could reconstruct a GPU calling context tree on the CPU as an application executes. However, this method would have high overhead with frequent communication between CPUs and GPUs to copy traces [28].…”
Section: Gpu Calling Context Treementioning
confidence: 99%