2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2022
DOI: 10.1109/cgo53902.2022.9741290
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Execution of OpenMP on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…Though CUDA enables granular control of parallelization, it generally needs a complete rewriting of the code, which can be a major disadvantage when optimized serial codes are available. Alternatives like directive-based approaches, such as OpenMP 27 and OpenACC 28 , use #pragma directives to annotate potentially parallelizable portions of the code and can be used to target GPUs starting from a serial source code. As a result, this substantially reduces development time and effort.…”
Section: Resultsmentioning
confidence: 99%
“…Though CUDA enables granular control of parallelization, it generally needs a complete rewriting of the code, which can be a major disadvantage when optimized serial codes are available. Alternatives like directive-based approaches, such as OpenMP 27 and OpenACC 28 , use #pragma directives to annotate potentially parallelizable portions of the code and can be used to target GPUs starting from a serial source code. As a result, this substantially reduces development time and effort.…”
Section: Resultsmentioning
confidence: 99%
“…This directive signifies that the subsequent 'for' loop will be executed in a multi-threaded fashion, under the condition that there is no interdependence between each loop iteration. Upon completing their respective tasks, the threads in the team await at an implicit barrier at the conclusion of the single construct, unless a 'nowait' clause is specified [9].…”
Section: Openmpmentioning
confidence: 99%
“…In recent work we implemented loop transformation constructions introduced in OpenMP 5.1 [70,71], asynchronous offloading for OpenMP [132], efficient lowering of idiomatic OpenMP code to GPUs (under review), OpenMP-aware compiler optimizations with informative and actionable remarks for users (under review), a portable OpenMP device (=gpu) runtime written in OpenMP 5.1 (including atomic 2) partial( 4) partial( 8) partial( 16) partial (32) partial (64) partial (128) partial( 256 support) [133], a virtual GPU as debugging friendly offloading target on the host [134], improved diagnostics and execution information [135,136]. We redone the OpenMP GPU code generation in LLVM/Clang [137] to improve performance and correctness. This work was complemented by a new LLVM/OpenMP GPU device runtime that helps us further close the performance gap compared to CUDA and other kernel languages [138].…”
Section: Recent Progressmentioning
confidence: 99%