2022
DOI: 10.3390/computation10060092
|View full text |Cite
|
Sign up to set email alerts
|

Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs

Abstract: I present two novel thread-safe in-place streaming schemes for the lattice Boltzmann method (LBM) on graphics processing units (GPUs), termed Esoteric Pull and Esoteric Push, that result in the LBM only requiring one copy of the density distribution functions (DDFs) instead of two, greatly reducing memory demand. These build upon the idea of the existing Esoteric Twist scheme, to stream half of the DDFs at the end of one stream-collide kernel and the remaining half at the beginning of the next and offer the sa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 54 publications
0
12
0
Order By: Relevance
“…This offers a large benefit, most prominently on FP16 accuracy, by substantially reducing numerical loss of significance at no additional computational cost. Since it is also beneficial for regular FP32 accuracy, it is already widely used in LBM codes such as our FluidX3D [6][7][8][9][10][11][12], OpenLB [68][69][70][71], ESPResSo [24][25][26], Palabos [72][73][74][75][76], and some versions of waLBerla [53]. In Appendix A 2, we provide the entire algorithm without and with DDF-shifting for comparison and in Appendix A 3 we clarify our notation.…”
Section: B Ddf-shiftingmentioning
confidence: 99%
See 4 more Smart Citations
“…This offers a large benefit, most prominently on FP16 accuracy, by substantially reducing numerical loss of significance at no additional computational cost. Since it is also beneficial for regular FP32 accuracy, it is already widely used in LBM codes such as our FluidX3D [6][7][8][9][10][11][12], OpenLB [68][69][70][71], ESPResSo [24][25][26], Palabos [72][73][74][75][76], and some versions of waLBerla [53]. In Appendix A 2, we provide the entire algorithm without and with DDF-shifting for comparison and in Appendix A 3 we clarify our notation.…”
Section: B Ddf-shiftingmentioning
confidence: 99%
“…For D3Q19, going from FP32/FP32 to FP32-16x reduces the memory footprint by ≈45%, to 93 bytes per node. If 16-bit compression was combined with in-place streaming schemes like AA-Pattern [34], Esoteric-Twist [62], Shift-and-Swap-Streaming [59], or the simple Esoteric-Pull [9], the memory footprint can even be reduced by ≈67%, to only 55 bytes per node.…”
Section: Memory and Performance Comparisonmentioning
confidence: 99%
See 3 more Smart Citations