2020
DOI: 10.1007/978-3-030-58144-2_3
|View full text |Cite
|
Sign up to set email alerts
|

A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…Detailed analysis of OpenMP 4.5 supported by different compilers show runtime overheads during the testing of different features [28]. More recently [11], three compilers supporting OpenMP directives for offloading tested discrete GPU compute capabilities, and runtime overheads in LLVM/Clang were identified with suggestions for manual implementation of acc_attach to create data structure on device and find association between host and device addresses. Similarly to using HIP/CUDA programming models, in directive-based programming the data management challenges have been one of the major hurdles in extending the applicability of OpenMP GPU offloading from benchmarks to full-scale production codes.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Detailed analysis of OpenMP 4.5 supported by different compilers show runtime overheads during the testing of different features [28]. More recently [11], three compilers supporting OpenMP directives for offloading tested discrete GPU compute capabilities, and runtime overheads in LLVM/Clang were identified with suggestions for manual implementation of acc_attach to create data structure on device and find association between host and device addresses. Similarly to using HIP/CUDA programming models, in directive-based programming the data management challenges have been one of the major hurdles in extending the applicability of OpenMP GPU offloading from benchmarks to full-scale production codes.…”
Section: Related Workmentioning
confidence: 99%
“…In more recent releases [9], new features to manage memory on heterogenous systems have been added with full support for accelerator devices. Increasing compiler support and optimizations have enabled numerous case studies and user experiences of OpenMP target offloading of inhouse applications [10], mini-apps [11], and benchmarks [12]. However, the simplicity of the example codes presented in these case studies often creates a challenge when translating and implementing OpenMP target offloading in productionready applications.…”
Section: Introductionmentioning
confidence: 99%
“…However, research efforts have focused on understanding how OpenMP can be used in a less architecture specific manner to improve portability [10]. Performance gaps between optimized CUDA code and directive-based versions have been narrowing in recent years, for both OpenACC and OpenMP [2], [3], [20], paving the way for more large applications to invest time in supporting a directive-based version or to chose a directivebased offloading strategy as the primary programming model for GPU support, such as was recently done for the widelyused materials program VASP, 7 and the polarizable molecular dynamics program Tinker-HP [1].…”
Section: B Directives For Gpu Offloadingmentioning
confidence: 99%
“…Numerous studies have focused on directive-based offloading as a solution for performance portability over the past decade [3], [5], [9], [10], [12], [13], [21]. Multiple reports compared the OpenACC and OpenMP approaches and explored difference in usage and performance on GPUs compared to CUDA, and for CPU-based threading and accelerators like the Xeon Phi [2], [5], [10], [12], using simplified kernels and miniapps.…”
Section: Related Workmentioning
confidence: 99%
“…However, it has recently been extended with improved offloading functionality that allows the compiler to offload certain parts of an application to accelerators such as GPUs and FPGAs. Consequently, OpenMP now can target both CPUs and GPUs, which offers better portability than vendor-specific approaches such as CUDA [36].…”
Section: Openmpmentioning
confidence: 99%