Many–Core Sustainability by Pragma Directives

Kucher, Andreas; Haase, Gundolf

doi:10.1007/978-3-662-43880-0_51

Cited by 1 publication

(1 citation statement)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically for Computational Fluid Dynamics (CFD), applications of OpenMP have proven their effectiveness better than MPI [6]. Meanwhile, OpenACC, with a very similar methodology than OpenMP, has recently appeared as a viable alternative for specialized low-level languages, such as CUDA C and OpenCL for fluid dynamics GPU based applications, with very good results [7,8]. The granularity in OpenACC is considered fine for its origin linked to CUDA.…”

Section: Scientific Programmingmentioning

confidence: 99%

Performance of a Code Migration for the Simulation of Supersonic Ejector Flow to SMP, MIC, and GPU Using OpenMP, OpenMP+LEO, and OpenACC Directives

Couder-Castañeda

Barrios-Piña

Gitler

2015

Scientific Programming

View full text Add to dashboard Cite

A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. This research follows well-proven strategies in order to obtain the best performance in both OpenMP and OpenACC. OpenMP has become the programming standard for scientific multicore software and OpenACC is one true alternative for graphics accelerators without the need of programming low level kernels. The strategies using OpenMP are oriented towards reducing the creation of parallel regions, tasks creation to handle boundary conditions, and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-core Xeon CPU, Xeon Phi 5110p, and Tesla C2070, obtaining the best performance from the latter. The Tesla C2070 presented an acceleration factor of 9.86X, 1.6X, and 4.5X compared against the serial version on CPU, 12-core Xeon CPU, and Xeon Phi, respectively.

show abstract

Section: Scientific Programmingmentioning

confidence: 99%