A Case Study for Performance Portability Using OpenMP 4.5

Gayatri, Rahulkumar; Yang, Charlene; Kurth, Thorsten; Deslippe, Jack

doi:10.1007/978-3-030-12274-4_4

Cited by 28 publications

(22 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On CPUs, that support array reductions, we can perform a reduction on the real and imaginary array equivalents rather than the scalars. A detailed analysis of our methodology is presented in Reference 24 .…”

Section: Applications: Porting and Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Experiences in porting mini‐applications to OpenACC and OpenMP on heterogeneous systems

Vergara

Budiardja

Gayatri

et al. 2020

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Summary This article studies mini‐applications—Minisweep, GenASiS, GPP, and FF—that use computational methods commonly encountered in HPC. We have ported these applications to develop OpenACC and OpenMP versions, and evaluated their performance on Titan (Cray XK7 with K20x GPUs), Cori (Cray XC40 with Intel KNL), Summit (IBM AC922 with Volta GPUs), and Cori‐GPU (Cray CS‐Storm 500NX with Intel Skylake and Volta GPUs). Our goals are for these new ports to be useful to both application and compiler developers, to document and describe the lessons learned and the methodology to create optimized OpenMP and OpenACC versions, and to provide a description of possible migration paths between the two specifications. Cases where specific directives or code patterns result in improved performance for a given architecture are highlighted. We also include discussions of the functionality and maturity of the latest compilers available on the above platforms with respect to OpenACC or OpenMP implementations.

show abstract

Section: Applications: Porting and Resultsmentioning

confidence: 99%

“…The GPP and FF mini‐apps represent the General Plasmon Pole and Full Frequency self‐energy summations in BerkeleyGW. The GPP mini‐app already has documented OpenMP 4.5 and OpenACC ports 17 …”

Section: Introductionmentioning

confidence: 99%

Experiences in porting mini‐applications to OpenACC and OpenMP on heterogeneous systems

Vergara

Budiardja

Gayatri

et al. 2020

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

show abstract

“…Because of the early adoption of OpenMP directives, we were able to learn from the experiences of Vergara Larrea, et al [12] who used OpenMP 4.0 directives to port codes to NVIDIA GPUs. The challenges of using OpenMP 4.5 for performance portability has been documented in detail in work by Gayatri, et al [13] This study laid the groundwork for improving TestSNAP serial version using OpenMP. From this study, it was observed that the collapse clause would be better optimized using the column-major data storage format for 2D and higher dimensional arrays.…”

Section: Related Workmentioning

confidence: 89%

Evaluating Performance Portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs Using the Roofline Methodology

Mehta

Gayatri

Ghadar

et al. 2021

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

In this paper, we show that OpenMP 4.5 based implementation of TestSNAP, a proxy-app for the Spectral Neighbor Analysis Potential (SNAP) in LAMMPS, can be ported across the NVIDIA, Intel, and AMD GPUs. Roofline analysis is employed to assess the performance of TestSNAP on each of the architectures. The main contributions of this paper are two-fold: 1) Provide OpenMP as a viable option for application portability across multiple GPU architectures, and 2) provide a methodology based on the roofline analysis to determine the performance portability of OpenMP implementations on the target architectures. The GPUs used for this work are Intel Gen9, AMD Radeon Instinct MI60, and NVIDIA Volta V100.

show abstract

“…In addition, the latest versions of the Nvidia HPC Software Development Kit (SDK) provide new tools and libraries designed to maximize performance by optimizing memory transfers and scaling to multiple devices while targeting heterogeneous resources [52]. Additionally, various studies regarding vendor-agnostic offloading approaches show promising results based on standard APIs and/or opensource, non-proprietary solutions [53,54]. These would be very interesting to explore in future iterations.…”

Section: Discussionmentioning

confidence: 99%

A GPU-Based Kalman Filter for Track Fitting

Mania

Gray

et al. 2021

Comput Softw Big Sci

View full text Add to dashboard Cite

Computing centres, including those used to process High-Energy Physics data and simulations, are increasingly providing significant fractions of their computing resources through hardware architectures other than x86 CPUs, with GPUs being a common alternative. GPUs can provide excellent computational performance at a good price point for tasks that can be suitably parallelized. Charged particle (track) reconstruction is a computationally expensive component of HEP data reconstruction, and thus needs to use available resources in an efficient way. In this paper, an implementation of Kalman filter-based track fitting using CUDA and running on GPUs is presented. This utilizes the ACTS (A Common Tracking Software) toolkit; an open source and experiment-independent toolkit for track reconstruction. The implementation details and parallelization approach are described, along with the specific challenges for such an implementation. Detailed performance benchmarking results are discussed, which show encouraging performance gains over a CPU-based implementation for representative configurations. Finally, a perspective on the challenges and future directions for these studies is outlined. These include more complex and realistic scenarios which can be studied, and anticipated developments to software frameworks and standards which may open up possibilities for greater flexibility and improved performance.

show abstract

A Case Study for Performance Portability Using OpenMP 4.5

Cited by 28 publications

References 9 publications

Experiences in porting mini‐applications to OpenACC and OpenMP on heterogeneous systems

Experiences in porting mini‐applications to OpenACC and OpenMP on heterogeneous systems

Evaluating Performance Portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs Using the Roofline Methodology

A GPU-Based Kalman Filter for Track Fitting

Contact Info

Product

Resources

About