2012
DOI: 10.1016/j.cpc.2011.12.011
|View full text |Cite
|
Sign up to set email alerts
|

QCD simulations with staggered fermions on GPUs

Abstract: We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU e… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
1

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(26 citation statements)
references
References 38 publications
0
25
1
Order By: Relevance
“…Once again, we believe that this is due to some immaturity of the compiler, for which we expect will be resolved in future versions. Fig.4 addresses the question of the efficiency costs (if any) of our architecture-portable code; it compares the execution time for a full Monte Carlo step (in double precision) of the OpenACC code and a previously developed CUDA implementation [9], optimized for NVIDIA GPUs. Although the two codes are not exactly in a one to one correspondence, the implementations are similar enough to make the test quantitatively meaningful.…”
Section: Performance Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…Once again, we believe that this is due to some immaturity of the compiler, for which we expect will be resolved in future versions. Fig.4 addresses the question of the efficiency costs (if any) of our architecture-portable code; it compares the execution time for a full Monte Carlo step (in double precision) of the OpenACC code and a previously developed CUDA implementation [9], optimized for NVIDIA GPUs. Although the two codes are not exactly in a one to one correspondence, the implementations are similar enough to make the test quantitatively meaningful.…”
Section: Performance Analysismentioning
confidence: 99%
“…Lattice QCD simulations is a typical and well known HPC grand challenge, where physics results are strongly limited by available computational resources [3,4]; over the years, several generations of parallel machines, optimized for LQCD, have been developed [5,6], while the development of LQCD codes running on many core architectures, in particular GPUs, has seen large efforts in the last decade [7][8][9]. Our target is to have a single code able to run on several processors without any major code change while looking for an acceptable trade-off between portability and efficiency [10].…”
Section: Introductionmentioning
confidence: 99%
“…All numerical simulations have been performed using an Rational Hybrid Monte-Carlo (RHMC) algorithm running on Graphics Processing Units (GPUs) [86,87].…”
Section: Simulation Detailsmentioning
confidence: 99%
“…Consequently, there is an on-going softwareand algorithm development in order to incorporate GPUs effectively into lattice simulations. See for example [1,2,3,4]. These applications are developed and carried out predominantly on NVIDIA hardware, consistently using the NVIDIA exclusive CUDA language [5] for the interaction with the GPU.…”
Section: Figure 1: Loewe-cscmentioning
confidence: 99%
“…The first lattice simulations in OpenCL were performed in [1] with staggered fermions. On NVIDIA hardware, a significantly lower performance (25% on C1060 and 60% on S2050) of OpenCL was reported compared to CUDA for Hybrid Monte Carlo (HMC) updates.…”
Section: Figure 1: Loewe-cscmentioning
confidence: 99%