Efficient Execution of OpenMP on GPUs

Huber, Joseph; Cornelius, Melanie; Georgakoudis, Giorgis; Tian, Shilei; Diaz, Jose M Monsalve; Dinel, Kuter; Chapman, Barbara; Doerfert, Johannes

doi:10.1109/cgo53902.2022.9741290

Cited by 25 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Though CUDA enables granular control of parallelization, it generally needs a complete rewriting of the code, which can be a major disadvantage when optimized serial codes are available. Alternatives like directive-based approaches, such as OpenMP 27 and OpenACC 28 , use #pragma directives to annotate potentially parallelizable portions of the code and can be used to target GPUs starting from a serial source code. As a result, this substantially reduces development time and effort.…”

Section: Resultsmentioning

confidence: 99%

Cost-effective massive computational speedups in simulations of high dimensional dynamical systems

Biswas,

Gupta

2023

Preprint

View full text Add to dashboard Cite

High dimensional dynamical systems involve a large number of coupled, generally nonlinear, differential equations that are computationally intensive to simulate, thus requiring expensive computational resources. This paper presents a novel strategy for numerically integrating such systems by utilizing cheaper gaming GPUs and mixed-precision computations. This is facilitated by exploiting the higher single-precision floating-point throughput of gaming GPUs and their relatively cheaper costs which enable packing multiple gaming GPUs into a single compute node. Appropriately parallelizing the algorithm and using mixed-precision computation enables massive computational speedup without severely compromising on accuracy. The potential of these developments is illustrated through a representative high dimensional dynamical system which shows a massive speedup over serial execution, even exceeding 1000-fold for larger-scale systems. These developments have the potential to make a substantial impact on the simulation and exploration of dynamical systems at massive scales without the need for expensive hardware resources.

show abstract

Section: Resultsmentioning

confidence: 99%

Cost-effective massive computational speedups in simulations of high dimensional dynamical systems

Biswas,

Gupta

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This directive signifies that the subsequent 'for' loop will be executed in a multi-threaded fashion, under the condition that there is no interdependence between each loop iteration. Upon completing their respective tasks, the threads in the team await at an implicit barrier at the conclusion of the single construct, unless a 'nowait' clause is specified [9].…”

Section: Openmpmentioning

confidence: 99%

Investigating the applications and analysis of physics engine technologies

Chen

2024

ACE

View full text Add to dashboard Cite

This research project delves into the performance impact of implementing parallel programming techniques in physics engine applications. With the advent of multi-core processors in contemporary computing environments, optimizing physics simulations through parallel programming has become increasingly feasible. A conventional blob collision physics engine serves as the benchmark for evaluation, and its performance is juxtaposed against a parallel-programmed variant. Experimental findings indicate a significant reduction in computational time required for collision detection and response when parallel processing is employed. This efficiency gain is particularly pronounced in scenarios involving a large number of blobs, showcasing the scalability advantages of parallelization. Moreover, parallel programming facilitates optimal harnessing of multi-core processor capabilities, thereby enhancing the overall efficiency and performance of the physics engine in question. This study not only substantiates the technical merits of applying parallel programming but also illuminates the practical benefits, including resource-efficient operation and quicker simulation times. Consequently, the research provides valuable insights for developers and engineers aiming to fully exploit the capabilities of modern computing hardware in physics-based simulations and applications.

show abstract

“…In recent work we implemented loop transformation constructions introduced in OpenMP 5.1 [70,71], asynchronous offloading for OpenMP [132], efficient lowering of idiomatic OpenMP code to GPUs (under review), OpenMP-aware compiler optimizations with informative and actionable remarks for users (under review), a portable OpenMP device (=gpu) runtime written in OpenMP 5.1 (including atomic 2) partial( 4) partial( 8) partial( 16) partial (32) partial (64) partial (128) partial( 256 support) [133], a virtual GPU as debugging friendly offloading target on the host [134], improved diagnostics and execution information [135,136]. We redone the OpenMP GPU code generation in LLVM/Clang [137] to improve performance and correctness. This work was complemented by a new LLVM/OpenMP GPU device runtime that helps us further close the performance gap compared to CUDA and other kernel languages [138].…”

Section: Recent Progressmentioning

confidence: 99%

ECP Software Technology Capability Assessment Report V3.0

Heroux

McInnes

et al. 2022

View full text Add to dashboard Cite

The Exascale Computing Project (ECP) Software Technology (ST) focus area is responsible for (1) developing critical software capabilities that will enable the successful execution of ECP applications and (2) providing key components of a productive and sustainable exascale computing ecosystem that will position the US Department of Energy (DOE) and the broader high-performance computing (HPC) community with a firm foundation for future extreme-scale computing capabilities.This ECP ST Capability Assessment Report (CAR) provides an overview and assessment of current ECP ST capabilities and activities, giving stakeholders and the broader HPC community information that can be used to assess ECP ST progress and plan their own efforts accordingly. ECP ST leaders commit to updating this document on regular basis (every 6-12 months). Highlights from this version of the report are presented here.This version of the CAR contains the following updates relative to the previous revision.• This report highlights the progress with the Extreme-scale Scientific Software Stack (E4S) efforts.In particular, this report discusses how E4S continues to gain traction as a first-class entity in the HPC ecosystem, enabling new conversations with users, facilities, vendors, other US agencies, and international partners.• The several-page summaries of each ECP Level 4 project were updated to reflect recent progress and next steps (Section 4). Of particular note are the experiences of our teams on early-access systems for Frontier.• The E4S is described further. E4S is now updated via quarterly releases. E4S is the primary integration and delivery vehicle for ECP ST capabilities (Section 2.1.1).• The ECP ST software development kit (SDK) effort further refined its groupings (Section 2.1.2).The ECP ST focus area represents the key bridge between exascale systems and the scientists developing applications that will run on those platforms. ECP ST efforts contribute to approximately 70 software products (Section 2.1.3) in six technical areas (Table 1). Since publishing the previous revision of the CAR, the team has continued to evolve the product dictionary of official product names, which enables more rigorous mapping of ECP ST deliverables to stakeholders (Section 2.1.4).Programming Models & Runtimes: In addition to developing key enhancements to MPI and OpenMP for scalable systems with accelerated node architectures, the team is working on performance portability layers (Kokkos and RAJA) and participating in OpenMP and OpenACC software design and development that will enable applications to write much of their source code without needing to provide vendor-specific implementations for each exascale system. One legacy of ECP ST efforts is anticipated to be a software stack that supports Intel and AMD accelerators in addition to NVIDIA's accelerators (Section 4.1).Development Tools: The team is enhancing existing widely used compilers (e.g., LLVM) and performance tools for next-generation platforms. Compilers are critical for heterogeneous archi...

show abstract

Efficient Execution of OpenMP on GPUs

Cited by 25 publications

References 21 publications

Cost-effective massive computational speedups in simulations of high dimensional dynamical systems

Cost-effective massive computational speedups in simulations of high dimensional dynamical systems

Investigating the applications and analysis of physics engine technologies

ECP Software Technology Capability Assessment Report V3.0

Contact Info

Product

Resources

About