Abstract. The best hope for reducing long-standing global climate model biases is by
increasing resolution to the kilometer scale. Here we present results from an
ultrahigh-resolution non-hydrostatic climate model for a near-global setup
running on the full Piz Daint supercomputer on 4888 GPUs (graphics
processing units). The dynamical core of the model has been completely
rewritten using a domain-specific language (DSL) for performance portability
across different hardware architectures. Physical parameterizations and
diagnostics have been ported using compiler directives. To our knowledge this
represents the first complete atmospheric model being run entirely on
accelerators on this scale. At a grid spacing of 930 m (1.9 km), we achieve
a simulation throughput of 0.043 (0.23) simulated years per day and an energy
consumption of 596 MWh per simulated year. Furthermore, we propose a new
memory usage efficiency (MUE) metric that considers how efficiently the
memory bandwidth – the dominant bottleneck of climate codes – is being
used.
From continuum studies it is known that the Coulomb string tension σC gives an upper bound for the physical (Wilson) string tension σW [1]. How does however such relationship translate to the lattice? In this paper we give evidence that there, while the two string tensions are related at zero temperature, they decouple at finite temperature. More precisely, we show that on the lattice the Coulomb gauge confinement scenario is always tied to the spatial string tension, which is known to survive the deconfinement phase transition and to cause screening effects in the quark-gluon plasma. Our analysis is based on the identification and elimination of center vortices which allows to control the physical string tension and study its effect on the Coulomb gauge observables. We also show how alternative definitions of the Coulomb potential may sense the deconfinement transition; however a true static Coulomb gauge order parameter for the phase transition is still elusive on the lattice.
A lattice gauge theory framework for simulations on graphic processing units (GPUs) using NVIDIA's CUDA is presented. The code comprises template classes that take care of an optimal data pattern to ensure coalesced reading from device memory to achieve maximum performance. In this work we concentrate on applications for lattice gauge fixing in 3+1 dimensional SU(3) lattice gauge field theories. We employ the overrelaxation, stochastic relaxation and simulated annealing algorithms which are perfectly suited to be accelerated by highly parallel architectures like GPUs. The applications support the Coulomb, Landau and maximally Abelian gauges. Moreover, we explore the evolution of the numerical accuracy of the SU(3) valued degrees of freedom over the runtime of the algorithms in single (SP) and double precision (DP). Therefrom we draw conclusions on the reliability of SP and DP simulations and suggest a mixed precision scheme that performs the critical parts of the algorithm in full DP while retaining 80-90% of the SP performance. Finally, multi-GPUs are adopted to overcome the memory constraint of single GPUs. A communicator class which hides the MPI data exchange at the boundaries of the lattice domains, via the low bandwidth PCI-Bus, effectively behind calculations in the inner part of the domain is presented. Linear scaling using 16 NVIDIA Tesla C2070 devices and a maximum performance of 3.5 Teraflops on lattices of size down to 64 3 × 256 is demonstrated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.