Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

Hayashi, Akira; Shirako, Jun; Tiotto, Etorre; Ho, Robert; Sarkar, Vivek

doi:10.1504/ijhpcn.2019.097051

Cited by 8 publications

(3 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The analysis presented in the previous publications can be categorized into compiler optimization, runtime overheads, and data management challenges. Compiler optimizes compute kernels, achieving high performance with OpenMP target offload on CPU and GPU targets when using teams distribute parallel for constructs and avoiding the use of explicit schedules [24], [25]. Other compiler optimization research has been focussed on accelerating user code that exists between the target and parallel constructs [24], [26], [27].…”

Section: Related Workmentioning

confidence: 99%

“…Compiler optimizes compute kernels, achieving high performance with OpenMP target offload on CPU and GPU targets when using teams distribute parallel for constructs and avoiding the use of explicit schedules [24], [25]. Other compiler optimization research has been focussed on accelerating user code that exists between the target and parallel constructs [24], [26], [27]. Detailed analysis of OpenMP 4.5 supported by different compilers show runtime overheads during the testing of different features [28].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Porting HPC Applications to AMD Instinct™ MI300A using Unified Memory and OpenMP®

Tandon,

Grinberg,

Bercea

et al. 2024

ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

View full text Add to dashboard Cite

Instinct TM MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC TM cores and third generation CDNA TM compute units. A single memory space offers several advantages: i) it eliminates the need for data replication and costly data transfers, ii) it substantially simplifies application development and allows an incremental acceleration of applications, iii) is easy to maintain, and iv) its potential can be well realized via the abstractions in the OpenMP® 5.2 standard, where the host and the device data environments can be unified in a more performant way. In this article, we provide a blueprint of the APU programming model leveraging unified memory and highlight key distinctions compared to the conventional approach with discrete GPUs. OpenFOAM®, an open-source C++ library for computational fluid dynamics, is presented as a case study to emphasize the flexibility and ease of offloading a full-scale production-ready application on MI300 APUs using directivebased OpenMP programming.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Porting HPC Applications to AMD Instinct™ MI300A using Unified Memory and OpenMP®

Tandon,

Grinberg,

Bercea

et al. 2024

ISC High Performance 2024 Research Paper Proceedings (39th International Conference)

View full text Add to dashboard Cite

show abstract

“…It is possible to find information about the usage of OpenMP and GPU programming in the OpenMP specifications [8]. The following papers [9], [10], [11] explain the usage of GPU offloading pragmas. Problem of this application is in it's strong similarity to CUDA and OpenAcc because accelerator programming in OpenMP is also based on defining of compute kernels, parallelization of cycles.…”

Section: Openmpmentioning

confidence: 99%

The Survey of Object-Oriented Software Programming Language from a Heterogeneous Cluster Programming Viewpoint

Brandejsky

Hrbek

2022

Software Engineering Perspectives in Systems

View full text Add to dashboard Cite

In this paper, the problem of programming language selection is presented from the position of large cluster with heterogeneous accelerators programming in the situations when it is need to apply objectoriented programming like in the case of heterogeneous multi-agent simulations or large data modelling using memetic algorithms. This work was inspired by experience with the Chapel language obtained during complicated conversion of hybrid evolutionary algorithm GPAes from a single node OpenMP C++ implementation onto HPC cluster with nodes equipped by both CPU and GPGPU. The paper consists of discussion of many approaches to parallel programming including not only traditional ways such as OpenMP, MPI and Cuda and their combinations, but also modern extensions of C/C++ as OpenACC, Silk and CYCL. Emerging languages as Chapel and Julia are discussed too. The work concludes with an evaluation of the real state of parallel object-oriented programming on heterogeneous node HPC clusters.

show abstract

A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload

Daley

Ahmed

Williams

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

Cited by 8 publications

References 24 publications

Porting HPC Applications to AMD Instinct™ MI300A using Unified Memory and OpenMP®

Porting HPC Applications to AMD Instinct™ MI300A using Unified Memory and OpenMP®

The Survey of Object-Oriented Software Programming Language from a Heterogeneous Cluster Programming Viewpoint

A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload

Contact Info

Product

Resources

About