2017
DOI: 10.1109/tpds.2017.2691770
|View full text |Cite
|
Sign up to set email alerts
|

FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
1

Year Published

2018
2018
2020
2020

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(26 citation statements)
references
References 32 publications
0
25
1
Order By: Relevance
“…We also employ temporal blocking to take advantage of the temporal locality of stencil computation by storing intermediate results of multiple iterations (time steps) on-chip, before finally writing them back to external memory. Unlike many previous studies on FPGAs [14][15][16][17], combining spatial and temporal blocking allows us to achieve high performance without restricting input size.…”
Section: A Base Implementation For First-order Stencilsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also employ temporal blocking to take advantage of the temporal locality of stencil computation by storing intermediate results of multiple iterations (time steps) on-chip, before finally writing them back to external memory. Unlike many previous studies on FPGAs [14][15][16][17], combining spatial and temporal blocking allows us to achieve high performance without restricting input size.…”
Section: A Base Implementation For First-order Stencilsmentioning
confidence: 99%
“…We implement spatial blocking by taking advantage of the shifting pattern of stencil computation, and use shift registers that are implemented using FPGA Block RAMs as on-chip buffers to minimize usage of FPGA on-chip memory. This technique is regularly used for stencil computation on FPGAs [14,15,17], but cannot be used on CPUs and GPUs due to lack of hardware support. We also vectorize the computation of each spatial block in the x dimension by unrolling our main loop to update multiple consecutive cells in parallel.…”
Section: A Base Implementation For First-order Stencilsmentioning
confidence: 99%
“…Previous work [1,9,20,22] have shown that FPGAs can achieve GPU-level performance in stencil computation. Most of such work achieve this level of performance by relying on temporal blocking without spatial blocking.…”
Section: Introductionmentioning
confidence: 99%
“…By avoiding spatial blocking, design complexity is significantly reduced and performance can scale near-linearly with the degree of temporal parallelism. However, depending on on-chip memory size, lack of spatial blocking comes at the cost of limiting width for 2D stencils to a few thousands cells [9,20,22], and plane size for 3D stencils to 128 × 128 cells or even less [20,22]. Furthermore, lack of spatial blocking prevents supporting larger input sizes by spatial distribution over multiple FPGAs.…”
Section: Introductionmentioning
confidence: 99%
“…This is due to the fixed architecture of the GPP, where not all functional units can be fully utilized, and the inherent parallelism of FPGAs and their dynamic architecture. In addition, despite having lower clock frequencies (up to 300MHz), FPGAs can achieve better performances due to their architectures which allow higher levels of parallelism through custom design [80]. In a study by [81], the authors compared the performance and power efficiency of FPGAs to that of GPPs and GPUs using double-precision floating point matrixvector multiplication.…”
Section: B Dsp-basedmentioning
confidence: 99%