Proceedings of the 2018 International Symposium on Code Generation and Optimization - CGO 2018 2018
DOI: 10.1145/3179541.3168831
|View full text |Cite
|
Sign up to set email alerts
|

CUDAAdvisor: LLVM-based runtime profiling for modern GPUs

Abstract: General-purpose GPUs have been widely utilized to accelerate parallel applications. Given a relatively complex programming model and fast architecture evolution, producing efficient GPU code is nontrivial. A variety of simulation and profiling tools have been developed to aid GPU application optimization and architecture design. However, existing tools are either limited by insufficient insights or lacking in support across different GPU architectures, runtime and driver versions. This paper presents CUDAAdvis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 38 publications
0
6
0
Order By: Relevance
“…This is especially desired when porting traditional CPU-based HPC applications onto the new GPU-based exascale systems, such as Summit [6], Sierra [7] and Perlmutter [37]. As part of the community effort, we are planning to pursue these research directions in our future work with our past experience on GPU analytic modeling [38], [39], [40] and performance optimization [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51].…”
Section: Discussionmentioning
confidence: 99%
“…This is especially desired when porting traditional CPU-based HPC applications onto the new GPU-based exascale systems, such as Summit [6], Sierra [7] and Perlmutter [37]. As part of the community effort, we are planning to pursue these research directions in our future work with our past experience on GPU analytic modeling [38], [39], [40] and performance optimization [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51].…”
Section: Discussionmentioning
confidence: 99%
“…Compared to multithreaded shared-memory programs on CPUs, it is relatively complex to write efficient CUDA programs and utilize the GPU memory hierarchy. Several performance profiling tools help optimize CUDA programs [9,47,60,62], but these techniques do not help with concurrency correctness.…”
Section: Race Detection and Program Analyses On Gpusmentioning
confidence: 99%
“…Yeh et al [8] instrument GPU code as it is generated by LLVM to identify redundant instructions. CUDAAdvisor [32] also instruments code as it is generated by LLVM to monitor GPU memory access and decide if bypassing could be used. GVProf [4] instruments GPU binaries to detect both temporal and spatial redundant value patterns.…”
Section: Related Workmentioning
confidence: 99%
“…Prior tools on GPUs [4, 8,32] provide fine-grained suggestions using instrumentation-based methods to quantify the severity of performance problems and locate problematic code. These tools identify one or a few patterns, such as redundant value/address, insufficient cache utilization, or memory transaction burst, but overlook others.…”
Section: Introductionmentioning
confidence: 99%