2019
DOI: 10.1016/j.cpc.2018.04.031
|View full text |Cite
|
Sign up to set email alerts
|

A new GPU implementation for lattice-Boltzmann simulations on sparse geometries

Abstract: We describe a high-performance implementation of the lattice Boltzmann method (LBM) for sparse 3D geometries on graphic processors (GPU). The main contribution of this work is a data layout that allows to minimise the number of redundant memory transactions during the propagation step of LBM. We show that by using a uniform mesh of small three-dimensional tiles and a careful data placement it is possible to utilise more than 70% of maximum theoretical GPU memory bandwidth for D3Q19 lattice and double precision… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
18
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(20 citation statements)
references
References 23 publications
2
18
0
Order By: Relevance
“…Our results show comparable performance in terms of memory bandwidth (up to ∼68.2% peak bandwidth) to a block structured code with two distributions on recent Nvidia GPUs [36] and to full matrix codes. In terms of execution time, the sparse matrix EsoTwist method was found to obtain 99.4% of the performance of the AB-pattern on a recent GPU.…”
Section: Discussionsupporting
confidence: 52%
See 1 more Smart Citation
“…Our results show comparable performance in terms of memory bandwidth (up to ∼68.2% peak bandwidth) to a block structured code with two distributions on recent Nvidia GPUs [36] and to full matrix codes. In terms of execution time, the sparse matrix EsoTwist method was found to obtain 99.4% of the performance of the AB-pattern on a recent GPU.…”
Section: Discussionsupporting
confidence: 52%
“…For the larger pipe oriented along the x-axis, we obtain 738.5 MNUPS (59.1% peak bandwidth); for the larger pipe oriented along the y-axis, we obtain 729.8 MNUPS (58.4% peak bandwidth) and, for the z direction, we obtain 727.3 MNUPS (58.2% peak bandwidth). Our performance figures compare to between 407 and 684 MNUPS reported for the GeForce GTX Titan by Tomczak and Szafran [36] using a lattice Boltzmann method with nineteen speeds (D3Q19 lattice) in double precision in a tiling layout (similar to block structuring) and two data sets (AB-pattern), which corresponds to between 48% and 72.6% bandwidth (the numbers are taken directly from [36] as our performance model does not apply directly to their code). The relatively wide range in the performance of [36] is due to differences in the data locality in the different test cases.…”
Section: Isotropymentioning
confidence: 92%
“…A number of high-performance lattice Boltzmann codes have been developed by various groups, including Palabos [15], waLBerla [16], MUPHY [17], and HARVEY [18]. With the availability of application programming interfaces for general-purpose graphics processing units, there has been increasing interest in GPU implementations of the LBM [19], [20], [21], [22], [23], [14], [24]. These efforts address a variety of aspects including efficient data layouts [22], [14], indirect addressing solutions [19], [23], and multi-GPU implementations [25], [21], [17], [23], [26], [27].…”
Section: B Related Workmentioning
confidence: 99%
“…A number of works on the parallel LBM have been carried out on multi-CPUs 21,22 and multi-GPUs. [23][24][25][26][27][28] However, all of them were conducted on uniform Cartesian grids. In this paper, a parallel 3D FV-LBM on the unstructured grids is developed and analyzed on the Tianhe-2A supercomputer at the National Supercomputing Center in Guangzhou, China.…”
Section: Introductionmentioning
confidence: 99%