Accelerating solvers for global atmospheric equations through mixed-precision data flow engine

Gan, Lin; Fu, Haohuan; Luk, Wayne; Yang, Chao; Xue, Wei; Zhang, Youhui; Yang, Guangwen

doi:10.1109/fpl.2013.6645508

Cited by 29 publications

(10 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Oriato et al show a speed‐up of a meteorological limited area model of up to a factor of 74 on a dataflow node (which is based on FPGAs) compared to a X86 CPU computing node [ Oriato et al ., ]. Gan et al run a global shallow water model on four FPGAs with a 330 times speed‐up over a 6‐core CPU [ Gan et al ., ]. We note that it is extremely difficult to make fair comparisons between hardware which is as different as CPUs and FPGAs.…”

Section: Introductionmentioning

confidence: 99%

On the use of programmable hardware and reduced numerical precision in earth‐system modeling

Dueben

Russell

Niu

et al. 2015

J Adv Model Earth Syst

Self Cite

View full text Add to dashboard Cite

Programmable hardware, in particular Field Programmable Gate Arrays (FPGAs), promises a significant increase in computational performance for simulations in geophysical fluid dynamics compared with CPUs of similar power consumption. FPGAs allow adjusting the representation of floating-point numbers to specific application needs. We analyze the performance-precision trade-off on FPGA hardware for the two-scale Lorenz '95 model. We scale the size of this toy model to that of a high-performance computing application in order to make meaningful performance tests. We identify the minimal level of precision at which changes in model results are not significant compared with a maximal precision version of the model and find that this level is very similar for cases where the model is integrated for very short or long intervals. It is therefore a useful approach to investigate model errors due to rounding errors for very short simulations (e.g., 50 time steps) to obtain a range for the level of precision that can be used in expensive long-term simulations. We also show that an approach to reduce precision with increasing forecast time, when model errors are already accumulated, is very promising. We show that a speed-up of 1.9 times is possible in comparison to FPGA simulations in single precision if precision is reduced with no strong change in model error. The single-precision FPGA setup shows a speed-up of 2.8 times in comparison to our model implementation on two 6-core CPUs for large model setups.

show abstract

Section: Introductionmentioning

confidence: 99%

On the use of programmable hardware and reduced numerical precision in earth‐system modeling

Dueben

Russell

Niu

et al. 2015

J Adv Model Earth Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…Hardware designers developed tools to support the automatic conversion of floating point values to fixed point ones [3,13,19,27]. More recent work in this field aims at exploiting fixed point arithmetics on FPGA accelerators to speedup floating point applications, such as [20].…”

Section: Related Workmentioning

confidence: 99%

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Nobre

Reis

Bispo

et al. 2018

Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Archite

View full text Add to dashboard Cite

Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single-or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.

show abstract

“…For example, Oriato et al [ 10 ] encode the dynamical core of a limited area meteorological model on an FPGA and report a 74× speed-up compared with a 12-core multi-threaded central processing unit (CPU) implementation. A related study [ 11 ] uses the technology to integrate a global atmospheric shallow-water system to achieve a 14× acceleration and a 9× increase in energy efficiency compared with a hybrid CPU–GPU implementation. In both of these studies, a variety of reduced precision techniques are used to maximize the efficiency of the FPGA’s finite computational resources.…”

Section: Introductionmentioning

confidence: 99%

Bitwise efficiency in chaotic models

2017

View full text Add to dashboard Cite

Motivated by the increasing energy consumption of supercomputing for weather and climate simulations, we introduce a framework for investigating the bit-level information efficiency of chaotic models. In comparison with previous explorations of inexactness in climate modelling, the proposed and tested information metric has three specific advantages: (i) it requires only a single high-precision time series; (ii) information does not grow indefinitely for decreasing time step; and (iii) information is more sensitive to the dynamics and uncertainties of the model rather than to the implementation details. We demonstrate the notion of bit-level information efficiency in two of Edward Lorenz’s prototypical chaotic models: Lorenz 1963 (L63) and Lorenz 1996 (L96). Although L63 is typically integrated in 64-bit ‘double’ floating point precision, we show that only 16 bits have significant information content, given an initial condition uncertainty of approximately 1% of the size of the attractor. This result is sensitive to the size of the uncertainty but not to the time step of the model. We then apply the metric to the L96 model and find that a 16-bit scaled integer model would suffice given the uncertainty of the unresolved sub-grid-scale dynamics. We then show that, by dedicating computational resources to spatial resolution rather than numeric precision in a field programmable gate array (FPGA), we see up to 28.6% improvement in forecast accuracy, an approximately fivefold reduction in the number of logical computing elements required and an approximately 10-fold reduction in energy consumed by the FPGA, for the L96 model.

show abstract

Accelerating solvers for global atmospheric equations through mixed-precision data flow engine

Cited by 29 publications

References 14 publications

On the use of programmable hardware and reduced numerical precision in earth‐system modeling

On the use of programmable hardware and reduced numerical precision in earth‐system modeling

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Bitwise efficiency in chaotic models

Contact Info

Product

Resources

About