2019
DOI: 10.1155/2019/7807860
|View full text |Cite
|
Sign up to set email alerts
|

First Steps in Porting the LFRic Weather and Climate Model to the FPGAs of the EuroExa Architecture

Abstract: In recent years, there has been renewed interest in the use of field-programmable gate arrays (FPGAs) for high-performance computing (HPC). In this paper, we explore the techniques required by traditional HPC programmers in porting HPC applications to FPGAs, using as an example the LFRic weather and climate model. We report on the first steps in porting LFRic to the FPGAs of the EuroExa architecture. We have used Vivado High-Level Syntheusywwi to implement a matrix-vector kernel from the LFRic code on a Xilinx… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 29 publications
(32 reference statements)
0
13
0
Order By: Relevance
“…In [11] the author's matrix-vector kernel involved looping over two double precision floating point operations, whereas in comparison the kernel we are offloading to the FPGA comprises of fifty three double precision floating point operations, twenty one double precision additions or subtractions, and thirty two double precision multiplications. We are also running on much larger grid sizes, and whereas in [11] the authors were limited to a maximum data size of 17MB due to keeping within the BRAM on the Zynq, in the work detailed in this paper we consider grid sizes resulting in 6.44GB of prognostic field data (and a further 6.44GB for the field source terms), necessitating the use of external SDRAM on the PCIe card. Figure 2 illustrates the performance of our HLS PW advection kernel over numerous steps that we applied one after another, for an experiment of x=512, y=512, z=64 (16.7 million grid cells).…”
Section: Fpga Programming Techniques and Our Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…In [11] the author's matrix-vector kernel involved looping over two double precision floating point operations, whereas in comparison the kernel we are offloading to the FPGA comprises of fifty three double precision floating point operations, twenty one double precision additions or subtractions, and thirty two double precision multiplications. We are also running on much larger grid sizes, and whereas in [11] the authors were limited to a maximum data size of 17MB due to keeping within the BRAM on the Zynq, in the work detailed in this paper we consider grid sizes resulting in 6.44GB of prognostic field data (and a further 6.44GB for the field source terms), necessitating the use of external SDRAM on the PCIe card. Figure 2 illustrates the performance of our HLS PW advection kernel over numerous steps that we applied one after another, for an experiment of x=512, y=512, z=64 (16.7 million grid cells).…”
Section: Fpga Programming Techniques and Our Approachmentioning
confidence: 99%
“…There have been a number of previous activities investigating the role that FPGAs can play in accelerating HPC codes. One such example is [11], where the authors investigated using the high-level productivity design methodology to accelerate solving of the Helmholtz equation. They offloaded the matrix-vector updates requires as part of this solver onto a Zynq Ultrascale, however the performance they observed was around half of that when the code was run on a twelve core Broadwell CPU.…”
Section: Fpga Programming Techniques and Our Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…There have been a number of previous activities investigating the role that FPGAs can play in accelerating HPC codes. One such example is [11], where the authors investigated using the high-level productivity design methodology via HLS to accelerate solving the Helmholtz equation. They offloaded the matrix-vector updates which are required as part of this solver onto a Zynq Ultrascale, however the performance obtained was around half of that when the code was run on a twelve core Broadwell CPU.…”
Section: Background and Related Workmentioning
confidence: 99%
“…They offloaded the matrix-vector updates which are required as part of this solver onto a Zynq Ultrascale, however the performance obtained was around half of that when the code was run on a twelve core Broadwell CPU. In [11] the author's matrix-vector kernel involved looping over two double precision floating point operations, whereas in this work we are focused on accelerating a much more complicated kernel comprising of fifty three double precision floating point operations per grid cell. These double precision operations involve twenty one additions or subtractions, and thirty two double precision multiplications.…”
Section: Background and Related Workmentioning
confidence: 99%