2016
DOI: 10.1177/1094342015626584
|View full text |Cite
|
Sign up to set email alerts
|

An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication

Abstract: We present performance results and an analysis of a message passing interface (MPI)/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather–scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, str… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
16
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(17 citation statements)
references
References 7 publications
1
16
0
Order By: Relevance
“…The lower-bound runtime is indicated by the granularity-limit line in the plots. In the present case, the GPU outperforms the CPU, but Otten et al 11 show that the CPUbased simulations can outperform the GPU from a pure speed standpoint for the case of N =7, where the granularity limit of the SEDG formulation is reduced to 343 points per core. Although the use of more cores allows the CPU-based simulations to be faster, it does not alter their overall energy consumption, which remains at 2.5× that of the GPU-based runs.…”
Section: Modeling Multi-gpu Performancementioning
confidence: 78%
See 2 more Smart Citations
“…The lower-bound runtime is indicated by the granularity-limit line in the plots. In the present case, the GPU outperforms the CPU, but Otten et al 11 show that the CPUbased simulations can outperform the GPU from a pure speed standpoint for the case of N =7, where the granularity limit of the SEDG formulation is reduced to 343 points per core. Although the use of more cores allows the CPU-based simulations to be faster, it does not alter their overall energy consumption, which remains at 2.5× that of the GPU-based runs.…”
Section: Modeling Multi-gpu Performancementioning
confidence: 78%
“…A major difference is that one can amortize the nearest-neighbor communication costs by updating the surface flux terms for all six components of the vector-field pair (E, H) in a single pass. Figure 4 shows performance results for the OpenACC/GPU-based variant of NekCEM developed in Otten et al 11 Timing runs are presented for the Cray XK7, Titan, using one GPU per node. Also shown in panels (b) and (c) are multi-CPU runs using 1, 4, 8, and 16 cores per node on Titan and on the IBM BG/Q, Vesta, for P =1, 2, 4,. .…”
Section: Modeling Multi-gpu Performancementioning
confidence: 99%
See 1 more Smart Citation
“…Codes that utilise MPI+OpenACC include: the electromagnetics code NekCEM (Otten et al, 2016), the community atmosphere model -spectral element (CAM-SE) (Norman et al, 2015), and the combustion code S3D (Levesque et al, 2012). Codes that utilise MPI+OpenMP include computational fluid dynamics MFIX (Gel et al, 2009), Second-order Mller-Plesset perturbation theory (MP2) (Katouda and Nakajima, 2013), and molecular dynamics (Kunaseth et al, 2013).…”
Section: Related Workmentioning
confidence: 99%
“…Many applications take advantage of heterogeneous hardware using an approach known as MPI+X that leverages MPI for communication and an accelerator language (e.g., CUDA and OpenCL) or directive-based language (e.g., OpenMP and OpenACC) for computation. Codes that utilize MPI+OpenACC include: the electromagnetics code NekCEM (25), the Community Atmosphere Model -Spectral Element (CAM-SE) (22), and the combustion code S3D (20). Codes that utilize MPI+OpenMP include computational fluid dynamics MFIX (10), Second-order Mller-Plesset perturbation theory (MP2) (17), and Molecular Dynamics (19).…”
Section: Related Workmentioning
confidence: 99%