2018
DOI: 10.1051/epjconf/201817509006
|View full text |Cite
|
Sign up to set email alerts
|

Performance Portability Strategies for Grid C++ Expression Templates

Abstract: Abstract. One of the key requirements for the Lattice QCD Application Development as part of the US Exascale Computing Project is performance portability across multiple architectures. Using the Grid C++ expression template as a starting point, we report on the progress made with regards to the Grid GPU offloading strategies. We present both the successes and issues encountered in using CUDA, OpenACC and Just-In-Time compilation. Experimentation and performance on GPUs with a SU(3)×SU(3) streaming test will be… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 5 publications
0
10
0
Order By: Relevance
“…For example, the QUDA library [1] is implementing various backends for extending its employment on GPUs made by other manufacturers than NVIDIA, such as AMD or Intel [2]. And the Grid library, whose development focused on achieving portability on CPU architectures [3], has been also implementing GPU support [4]. There are a handful of more examples where the lattice QCD community is either extending or developing new software for adapting to the diversity of computing architectures available nowadays.…”
Section: Related Work In Lattice Qcdmentioning
confidence: 99%
“…For example, the QUDA library [1] is implementing various backends for extending its employment on GPUs made by other manufacturers than NVIDIA, such as AMD or Intel [2]. And the Grid library, whose development focused on achieving portability on CPU architectures [3], has been also implementing GPU support [4]. There are a handful of more examples where the lattice QCD community is either extending or developing new software for adapting to the diversity of computing architectures available nowadays.…”
Section: Related Work In Lattice Qcdmentioning
confidence: 99%
“…M. Lin presented successes and challenges encountered in porting the Grid C++ expression template to GPU-based systems and exploring extensively different approaches to integrate CUDA, Ope-nACC and Just-In-Time compilation [22].…”
Section: Performance Portability Strategies For Grid C++ Expression T...mentioning
confidence: 99%
“…This intrinsic function returns the SVE vector register length (in double). In the loop body, we use the intrinsic function svld1() to load slices of the arrays x and y without decomposing the array elements (lines [8][9]. Computation proceeds with multiplyadd of complex numbers using two calls to svcmla() (the intrinsic function for the FCMLA instruction introduced in Section III-D) (lines 10-11).…”
Section: Complex Arithmetics Using Sve Acle (I)mentioning
confidence: 99%
“…A more general approach targeting various x86 SIMD ISAs, but in particular AVX-512, is Grid [4]. Meanwhile exploratory studies have been performed to extend the portability of Grid to other types of architectures, including GPU-accelerated ones [8]. A much earlier effort targeting architectures comprising NVIDIA GPUs supporting CUDA resulted in the QUDA library [3], which has meanwhile been used for several generations of GPU-accelerated supercomputers.…”
Section: Related Workmentioning
confidence: 99%