First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa Architecture

Ashworth, Mike; Riley, Graham; Attwood, Andrew; Mawer, John

doi:10.1155/2019/7807860

Cited by 10 publications

(13 citation statements)

References 29 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [11] the author's matrix-vector kernel involved looping over two double precision floating point operations, whereas in comparison the kernel we are offloading to the FPGA comprises of fifty three double precision floating point operations, twenty one double precision additions or subtractions, and thirty two double precision multiplications. We are also running on much larger grid sizes, and whereas in [11] the authors were limited to a maximum data size of 17MB due to keeping within the BRAM on the Zynq, in the work detailed in this paper we consider grid sizes resulting in 6.44GB of prognostic field data (and a further 6.44GB for the field source terms), necessitating the use of external SDRAM on the PCIe card. Figure 2 illustrates the performance of our HLS PW advection kernel over numerous steps that we applied one after another, for an experiment of x=512, y=512, z=64 (16.7 million grid cells).…”

Section: Fpga Programming Techniques and Our Approachmentioning

confidence: 99%

“…There have been a number of previous activities investigating the role that FPGAs can play in accelerating HPC codes. One such example is [11], where the authors investigated using the high-level productivity design methodology to accelerate solving of the Helmholtz equation. They offloaded the matrix-vector updates requires as part of this solver onto a Zynq Ultrascale, however the performance they observed was around half of that when the code was run on a twelve core Broadwell CPU.…”

Section: Fpga Programming Techniques and Our Approachmentioning

confidence: 99%

“…In terms of FLOPs, at 268 million grid cells our HLS kernel is providing 14.36 GFLOP/s (in comparison to 12 cores of Broadwell at 17.75 GFLOPs/), however when one includes the DMA transfer time this drops down to 4.2 GFLOP/s and so illustrates the very significant impact that DMA transfer time has on our results. The limit with some other investigations such as [11], is that they focus on the embedded CPU-FPGA Zynq chip, and limit their system size very severely to the BRAM on that chip. As such they don't encounter this transfer time overhead, but this is crucially important to bear in mind for processing realistic problems that are of interest to scientists.…”

Section: Performance Comparisonmentioning

confidence: 99%

See 2 more Smart Citations

Exploring the Acceleration of the Met Office NERC Cloud Model Using FPGAs

Brown

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The use of Field Programmable Gate Arrays (FPGAs) to accelerate computational kernels has the potential to be great benefit to scientific codes and the HPC community in general. With the recent developments in FPGA programming technology, the ability to port kernels is becoming far more accessible. However, to gain reasonable performance from this technology it is not enough to simple transfer a code onto the FPGA, instead the algorithm must be rethought and recast in a dataflow style to suit the target architecture. In this paper we describe the porting, via HLS, of one of the most computationally intensive kernels of the Met Office NERC Cloud model (MONC), an atmospheric model used by climate and weather researchers, onto an FPGA. We describe in detail the steps taken to adapt the algorithm to make it suitable for the architecture and the impact this has on kernel performance. Using a PCIe mounted FPGA with on-board DRAM, we consider the integration on this kernel within a larger infrastructure and explore the performance characteristics of our approach in contrast to Intel CPUs that are popular in modern HPC machines, over problem sizes involving very large grids. The result of this work is an experience report detailing the challenges faced and lessons learnt in porting this complex computational kernel to FPGAs, as well as exploring the role that FPGAs can play and their fundamental limits in accelerating traditional HPC workloads.

show abstract

Section: Fpga Programming Techniques and Our Approachmentioning

confidence: 99%

Section: Fpga Programming Techniques and Our Approachmentioning

confidence: 99%

Section: Performance Comparisonmentioning

confidence: 99%

See 1 more Smart Citation

Exploring the Acceleration of the Met Office NERC Cloud Model Using FPGAs

Brown

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…There have been a number of previous activities investigating the role that FPGAs can play in accelerating HPC codes. One such example is [11], where the authors investigated using the high-level productivity design methodology via HLS to accelerate solving the Helmholtz equation. They offloaded the matrix-vector updates which are required as part of this solver onto a Zynq Ultrascale, however the performance obtained was around half of that when the code was run on a twelve core Broadwell CPU.…”

Section: Background and Related Workmentioning

confidence: 99%

“…They offloaded the matrix-vector updates which are required as part of this solver onto a Zynq Ultrascale, however the performance obtained was around half of that when the code was run on a twelve core Broadwell CPU. In [11] the author's matrix-vector kernel involved looping over two double precision floating point operations, whereas in this work we are focused on accelerating a much more complicated kernel comprising of fifty three double precision floating point operations per grid cell. These double precision operations involve twenty one additions or subtractions, and thirty two double precision multiplications.…”

Section: Background and Related Workmentioning

confidence: 99%

It's All About Data Movement: Optimising FPGA Data Access to Boost Performance

Brown

Dolman²

2019

2019 IEEE/ACM International Workshop on Heterogeneous High-Performance Reconfigurable Computing (H2RC)

View full text Add to dashboard Cite

The use of reconfigurable computing, and FPGAs in particular, to accelerate computational kernels has the potential to be of great benefit to scientific codes and the HPC community in general. However, whilst recent advanced in FPGA tooling have made the physical act of programming reconfigurable architectures much more accessible, in order to gain good performance the entire algorithm must be rethought and recast in a dataflow style. Reducing the cost of data movement for all computing devices is critically important, and in this paper we explore the most appropriate techniques for FPGAs. We do this by describing the optimisation of an existing FPGA implementation of an atmospheric model's advection scheme. By taking an FPGA code that was over four times slower than running on the CPU, mainly due to data movement overhead, we describe the profiling and optimisation strategies adopted to significantly reduce the runtime and bring the performance of our FPGA kernels to a much more practical level for real-world use. The result of this work is a set of techniques, steps, and lessons learnt that we have found significantly improves the performance of FPGA based HPC codes and that others can adopt in their own codes to achieve similar results.

show abstract

Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs

Brown

2021

2021 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa Architecture

Cited by 10 publications

References 29 publications

Exploring the Acceleration of the Met Office NERC Cloud Model Using FPGAs

Exploring the Acceleration of the Met Office NERC Cloud Model Using FPGAs

It's All About Data Movement: Optimising FPGA Data Access to Boost Performance

Accelerating advection for atmospheric modelling on Xilinx and Intel FPGAs

Contact Info

Product

Resources

About