FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

Sano, Kentaro; Yamamoto, Satoru

doi:10.1109/tpds.2017.2691770

Cited by 26 publications

(26 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also employ temporal blocking to take advantage of the temporal locality of stencil computation by storing intermediate results of multiple iterations (time steps) on-chip, before finally writing them back to external memory. Unlike many previous studies on FPGAs [14][15][16][17], combining spatial and temporal blocking allows us to achieve high performance without restricting input size.…”

Section: A Base Implementation For First-order Stencilsmentioning

confidence: 99%

“…We implement spatial blocking by taking advantage of the shifting pattern of stencil computation, and use shift registers that are implemented using FPGA Block RAMs as on-chip buffers to minimize usage of FPGA on-chip memory. This technique is regularly used for stencil computation on FPGAs [14,15,17], but cannot be used on CPUs and GPUs due to lack of hardware support. We also vectorize the computation of each spatial block in the x dimension by unrolling our main loop to update multiple consecutive cells in parallel.…”

Section: A Base Implementation For First-order Stencilsmentioning

confidence: 99%

See 1 more Smart Citation

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Zohouri

Podobas

Matsuoka

2018

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

In this paper we evaluate the performance of FPGAs for high-order stencil computation using High-Level Synthesis. We show that despite the higher computation intensity and onchip memory requirement of such stencils compared to first-order ones, our design technique with combined spatial and temporal blocking remains effective. This allows us to reach similar, or even higher, compute performance compared to first-order stencils. We use an OpenCL-based design that, apart from parameterizing performance knobs, also parameterizes the stencil radius. Furthermore, we show that our performance model exhibits the same accuracy as first-order stencils in predicting the performance of high-order ones. On an Intel Arria 10 GX 1150 device, for 2D and 3D star-shaped stencils, we achieve over 700 and 270 GFLOP/s of compute performance, respectively, up to a stencil radius of four. These results outperform the state-of-theart YASK framework on a modern Xeon for 2D and 3D stencils, and outperform a modern Xeon Phi for 2D stencils, while achieving competitive performance in 3D. Furthermore, our FPGA design achieves better power efficiency in almost all cases.

show abstract

Section: A Base Implementation For First-order Stencilsmentioning

confidence: 99%

Section: A Base Implementation For First-order Stencilsmentioning

confidence: 99%

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Zohouri

Podobas

Matsuoka

2018

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

View full text Add to dashboard Cite

show abstract

“…Previous work [1,9,20,22] have shown that FPGAs can achieve GPU-level performance in stencil computation. Most of such work achieve this level of performance by relying on temporal blocking without spatial blocking.…”

Section: Introductionmentioning

confidence: 99%

“…By avoiding spatial blocking, design complexity is significantly reduced and performance can scale near-linearly with the degree of temporal parallelism. However, depending on on-chip memory size, lack of spatial blocking comes at the cost of limiting width for 2D stencils to a few thousands cells [9,20,22], and plane size for 3D stencils to 128 × 128 cells or even less [20,22]. Furthermore, lack of spatial blocking prevents supporting larger input sizes by spatial distribution over multiple FPGAs.…”

Section: Introductionmentioning

confidence: 99%

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Zohouri

Podobas

Matsuoka

2018

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

View full text Add to dashboard Cite

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for stencil computation, most previous work achieve this by avoiding spatial blocking and restricting input dimensions relative to FPGA on-chip memory. In this work we create a stencil accelerator using Intel FPGA SDK for OpenCL that achieves high performance without having such restrictions. We combine spatial and temporal blocking to avoid input size restrictions, and employ multiple FPGA-specific optimizations to tackle issues arisen from the added design complexity. Accelerator parameter tuning is guided by our performance model, which we also use to project performance for the upcoming Intel Stratix 10 devices. On an Arria 10 GX 1150 device, our accelerator can reach up to 760 and 375 GFLOP/s of compute performance, for 2D and 3D stencils, respectively, which rivals the performance of a highly-optimized GPU implementation. Furthermore, we estimate that the upcoming Stratix 10 devices can achieve a performance of up to 3.5 TFLOP/s and 1.6 TFLOP/s for 2D and 3D stencil computation, respectively. CCS CONCEPTS• Hardware → Reconfigurable logic and FPGAs; High-level and register-transfer level synthesis;

show abstract

“…This is due to the fixed architecture of the GPP, where not all functional units can be fully utilized, and the inherent parallelism of FPGAs and their dynamic architecture. In addition, despite having lower clock frequencies (up to 300MHz), FPGAs can achieve better performances due to their architectures which allow higher levels of parallelism through custom design [80]. In a study by [81], the authors compared the performance and power efficiency of FPGAs to that of GPPs and GPUs using double-precision floating point matrixvector multiplication.…”

Section: B Dsp-basedmentioning

confidence: 99%

Software-defined Radios: Architecture, state-of-the-art, and challenges

Akeela

Dezfouli

2018

Computer Communications

114

View full text Add to dashboard Cite

Software-defined Radio (SDR) is a programmable transceiver with the capability of operating various wireless communication protocols without the need to change or update the hardware. Progress in the SDR field has led to the escalation of protocol development and a wide spectrum of applications, with more emphasis on programmability, flexibility, portability, and energy efficiency, in cellular, WiFi, and M2M communication. Consequently, SDR has earned a lot of attention and is of great significance to both academia and industry. SDR designers intend to simplify the realization of communication protocols while enabling researchers to experiment with prototypes on deployed networks. This paper is a survey of the state-of-theart SDR platforms in the context of wireless communication protocols. We offer an overview of SDR architecture and its basic components, then discuss the significant design trends and development tools. In addition, we highlight key contrasts between SDR architectures with regards to energy, computing power, and area, based on a set of metrics. We also review existing SDR platforms and present an analytical comparison as a guide to developers. Finally, we recognize a few of the related research topics and summarize potential solutions.

show abstract

FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

Cited by 26 publications

References 32 publications

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Software-defined Radios: Architecture, state-of-the-art, and challenges

Contact Info

Product

Resources

About