2017
DOI: 10.1142/s0129183117500632
|View full text |Cite
|
Sign up to set email alerts
|

Design and optimization of a portable LQCD Monte Carlo code using OpenACC

Abstract: The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core GPUs, exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
3

Relationship

5
4

Authors

Journals

citations
Cited by 21 publications
(23 citation statements)
references
References 46 publications
0
23
0
Order By: Relevance
“…The partition function is periodic in b with period N x N y . Numerical simulations have been performed using the Rational Hybrid Monte-Carlo algorithm (RHMC) [126] implemented in the NISSA code [127] and in the Open-StaPLE code for GPUs [128,129]. We have performed around 100 runs with different combinations of T and B for each value of the pion mass, with average statistics of approximately 3000 RHMC trajectories for each run.…”
Section: Methodsmentioning
confidence: 99%
“…The partition function is periodic in b with period N x N y . Numerical simulations have been performed using the Rational Hybrid Monte-Carlo algorithm (RHMC) [126] implemented in the NISSA code [127] and in the Open-StaPLE code for GPUs [128,129]. We have performed around 100 runs with different combinations of T and B for each value of the pion mass, with average statistics of approximately 3000 RHMC trajectories for each run.…”
Section: Methodsmentioning
confidence: 99%
“…In these simulations, most of the time is spent in the execution of the so-called Dirac Operator, which is known to be memory-bound. 40 In particular, this benchmark exhibits a double precision operational intensity: I ≈ 0.62.…”
Section: Other Applicationsmentioning
confidence: 99%
“…Numerical simulations have been performed on the COKA cluster, using 5 computing nodes, each with 8 NVIDIA K80 dual-GPU boards and two 56 Gb/s FDR InfiniBand network interfaces. Our parallel code (Open-StaPLE) is a single [78] and multi [79] GPU implementation of a standard Rational Hybrid Monte-Carlo algorithm. It is an evolution of a previous CUDA code [80], developed using the OpenACC and OpenMPI frameworks to manage respectively parallelism on the GPUs and among the nodes.…”
Section: Numerical Setupmentioning
confidence: 99%