Performance Portability Strategies for Grid C++ Expression Templates

Boyle, Peter A.; Clark, Michael A.; DeTar, Carleton; Lin, Meifeng; Rana, Verinder S.; Avilés-Casco, Alejandro Vaquero

doi:10.1051/epjconf/201817509006

Cited by 9 publications

(10 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the QUDA library [1] is implementing various backends for extending its employment on GPUs made by other manufacturers than NVIDIA, such as AMD or Intel [2]. And the Grid library, whose development focused on achieving portability on CPU architectures [3], has been also implementing GPU support [4]. There are a handful of more examples where the lattice QCD community is either extending or developing new software for adapting to the diversity of computing architectures available nowadays.…”

Section: Related Work In Lattice Qcdmentioning

confidence: 99%

Lyncs-API: a Python API for Lattice QCD applications

Bacchio,

Finkenrath,

Stylianou

2022

Preprint

View full text Add to dashboard Cite

We present Lyncs-API, a Python API for Lattice QCD applications currently under development. Lyncs aims to bring several widely used libraries for Lattice QCD under a common framework. Lyncs flexibly links to libraries for CPUs and GPUs in a way that can accommodate additional computing architectures as these arise,achieving performance-portability for the calculations while maintaining the same high-level workflow. Lyncs distributes calculations using Dask and mpi4py, with bindings to the libraries automatically generated by cppyy. While Lyncs is designed to allow linking to multiple libraries, we focus on a set of targeted packages that include DDalphaAMG, tmLQCD, QUDA and c-lime. More libraries will be added in the future. We also develop genericpurpose tools for facilitating the usage of Python in Lattice QCD and HPC in general. The project is open-source, community-oriented and available on Github.

show abstract

Section: Related Work In Lattice Qcdmentioning

confidence: 99%

Lyncs-API: a Python API for Lattice QCD applications

Bacchio,

Finkenrath,

Stylianou

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…M. Lin presented successes and challenges encountered in porting the Grid C++ expression template to GPU-based systems and exploring extensively different approaches to integrate CUDA, Ope-nACC and Just-In-Time compilation [22].…”

Section: Performance Portability Strategies For Grid C++ Expression T...mentioning

confidence: 99%

Lattice QCD on new chips: a community summary

Rago

2018

EPJ Web Conf.

View full text Add to dashboard Cite

show abstract

“…This intrinsic function returns the SVE vector register length (in double). In the loop body, we use the intrinsic function svld1() to load slices of the arrays x and y without decomposing the array elements (lines [8][9]. Computation proceeds with multiplyadd of complex numbers using two calls to svcmla() (the intrinsic function for the FCMLA instruction introduced in Section III-D) (lines 10-11).…”

Section: Complex Arithmetics Using Sve Acle (I)mentioning

confidence: 99%

“…A more general approach targeting various x86 SIMD ISAs, but in particular AVX-512, is Grid [4]. Meanwhile exploratory studies have been performed to extend the portability of Grid to other types of architectures, including GPU-accelerated ones [8]. A much earlier effort targeting architectures comprising NVIDIA GPUs supporting CUDA resulted in the QUDA library [3], which has meanwhile been used for several generations of GPU-accelerated supercomputers.…”

Section: Related Workmentioning

confidence: 99%

SVE-Enabling Lattice QCD Codes

Meyer

Georg

Pleiter

et al. 2018

2018 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Optimization of applications for supercomputers of the highest performance class requires parallelization at multiple levels using different techniques. In this contribution we focus on parallelization of particle physics simulations through vector instructions. With the advent of the Scalable Vector Extension (SVE) ISA, future ARM-based processors are expected to provide a significant level of parallelism at this level.

show abstract

Performance Portability Strategies for Grid C++ Expression Templates

Cited by 9 publications

References 5 publications

Lyncs-API: a Python API for Lattice QCD applications

Lyncs-API: a Python API for Lattice QCD applications

Lattice QCD on new chips: a community summary

SVE-Enabling Lattice QCD Codes

Contact Info

Product

Resources

About