Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16

Klöwer, Milan; Hatfield, Sam; Croci, Matteo; Dueben, Peter; Palmer, Tim

doi:10.1029/2021ms002684

Cited by 11 publications

(6 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speed‐ups, see Figure 14b), for larger grid sizes lie between 3.8 and 4.2 consistently. For more information on half precision performance on the A64FX for physical models, see (Klöwer et al., 2022).…”

Section: Resultsmentioning

confidence: 99%

Mixed‐Precision for Linear Solvers in Global Geophysical Flows

Ackmann

Dueben

Palmer

et al. 2022

J Adv Model Earth Syst

View full text Add to dashboard Cite

Semi‐implicit (SI) time‐stepping schemes for atmosphere and ocean models require elliptic solvers that work efficiently on modern supercomputers. This paper reports our study of the potential computational savings when using mixed precision arithmetic in the elliptic solvers. Precision levels as low as half (16 bits) are used and a detailed evaluation of the impact of reduced precision on the solver convergence and the solution quality is performed. This study is conducted in the context of a novel SI shallow‐water model on the sphere, purposely designed to mimic numerical intricacies of modern all‐scale weather and climate (W&C) models. The governing algorithm of the shallow‐water model is based on the non‐oscillatory MPDATA methods for geophysical flows, whereas the resulting elliptic problem employs a strongly preconditioned non‐symmetric Krylov‐subspace Generalized Conjugated‐Residual (GCR) solver, proven in advanced atmospheric applications. The classical longitude/latitude grid is deliberately chosen to retain the stiffness of global W&C models. The analysis of the precision reduction is done on a software level, using an emulator, whereas the performance is measured on actual reduced precision hardware. The reduced‐precision experiments are conducted for established dynamical‐core test‐cases, like the Rossby‐Haurwitz wavenumber 4 and a zonal orographic flow. The study shows that selected key components of the elliptic solver, most prominently the preconditioning and the application of the linear operator, can be performed at the level of half precision. For these components, the use of half precision is found to yield a speed‐up of a factor 4 compared to double precision for a wide range of problem sizes.

show abstract

Section: Resultsmentioning

confidence: 99%

Mixed‐Precision for Linear Solvers in Global Geophysical Flows

Ackmann

Dueben

Palmer

et al. 2022

J Adv Model Earth Syst

View full text Add to dashboard Cite

show abstract

“…For simpler and smaller codes like SPEEDY, it is possible to identify such precision bottlenecks. In this case, one could rescale the relevant equations to ensure that the float operations can be described by the necessary number format as regards dynamic range and precision (see, e.g., Klöwer et al, 2022). Conversely, for larger, more complex codes, rescaling equations and refactoring code can be prohibitively difficult.…”

Section: Discussionmentioning

confidence: 99%

Climate‐change modelling at reduced floating‐point precision with stochastic rounding

Kimpson

Paxton

Chantry

et al. 2023

Quart J Royal Meteoro Soc

View full text Add to dashboard Cite

Reduced‐precision floating‐point arithmetic is now deployed routinely in numerical weather forecasting over short timescales. However, the applicability of these reduced‐precision techniques to longer‐timescale climate simulations—especially those that seek to describe a dynamical, changing climate—remains unclear. We investigate this question by deploying a global atmospheric, coarse‐resolution model known as Simplified Parameterizations PrimitivE Equation DYnamics (SPEEDY) to simulate a changing climate system subject to increased CO2$$ {\mathrm{CO}}_2 $$ concentrations, over a 100‐year timescale. Whilst double precision is typically the operational standard for climate modelling, we find that reduced‐precision solutions are sufficiently accurate. Rounding the floating‐point numbers stochastically, rather than using the more common “round‐to‐nearest” technique, improves the performance of the reduced‐precision solutions notably. Over 100 years, the mean bias error (MBE) in the global mean surface temperature (precipitation) relative to the double‐precision solution is +1.8 prefix×10prefix−2$$ \times 1{0}^{-2} $$ K (prefix−8prefix×10prefix−4$$ -8\times 1{0}^{-4} $$ mm·$$ \cdotp $$(6 hr)prefix−1$$ {}^{-1} $$) when integrating numerically at half precision (10 significant bits) with stochastic rounding. By examining the resultant climatic distributions that arise after 100 years, the difference in the expected value of the global surface temperature relative to the double‐precision solution is ≤5prefix×10prefix−3$$ \le 5\times 1{0}^{-3} $$ K and that for precipitation is 8prefix×10prefix−4$$ 8\times 1{0}^{-4} $$ mm·$$ \cdotp $$(6 hr)prefix−1$$ {}^{-1} $$. Whilst further research is necessary to extended these results to more complex and higher‐resolution models, they indicate that reduced‐precision techniques and stochastic rounding could be suitable for the next generation of climate models and motivate the use of low‐precision hardware to this end.

show abstract

“…Numerical stability and performance usually dictate this choice, but low precision can add an additional constraint: Using a shorter time step can cause stagnation as tendencies are too small to be added in the time integration. Stagnation from low precision can be overcome with a compensated time integration 61 or with stochastic rounding 32 . www.nature.com/scientificreports/ Longer orbits with more variables.…”

Section: Orbits In the Lorenz 1996 Systemmentioning

confidence: 99%

Periodic orbits in chaotic systems simulated at low precision

Klöwer

Coveney

Paxton

et al. 2023

Sci Rep

Self Cite

View full text Add to dashboard Cite

Non-periodic solutions are an essential property of chaotic dynamical systems. Simulations with deterministic finite-precision numbers, however, always yield orbits that are eventually periodic. With 64-bit double-precision floating-point numbers such periodic orbits are typically negligible due to very long periods. The emerging trend to accelerate simulations with low-precision numbers, such as 16-bit half-precision floats, raises questions on the fidelity of such simulations of chaotic systems. Here, we revisit the 1-variable logistic map and the generalised Bernoulli map with various number formats and precisions: floats, posits and logarithmic fixed-point. Simulations are improved with higher precision but stochastic rounding prevents periodic orbits even at low precision. For larger systems the performance gain from low-precision simulations is often reinvested in higher resolution or complexity, increasing the number of variables. In the Lorenz 1996 system, the period lengths of orbits increase exponentially with the number of variables. Moreover, invariant measures are better approximated with an increased number of variables than with increased precision. Extrapolating to large simulations of natural systems, such as million-variable climate models, periodic orbit lengths are far beyond reach of present-day computers. Such orbits are therefore not expected to be problematic compared to high-precision simulations but the deviation of both from the continuum solution remains unclear.

show abstract

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16

Cited by 11 publications

References 69 publications

Mixed‐Precision for Linear Solvers in Global Geophysical Flows

Mixed‐Precision for Linear Solvers in Global Geophysical Flows

Climate‐change modelling at reduced floating‐point precision with stochastic rounding

Periodic orbits in chaotic systems simulated at low precision

Contact Info

Product

Resources

About