A new GPU implementation for lattice-Boltzmann simulations on sparse geometries

Tomczak, Tadeusz; Szafran, Roman G.

doi:10.1016/j.cpc.2018.04.031

Cited by 22 publications

(20 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our results show comparable performance in terms of memory bandwidth (up to ∼68.2% peak bandwidth) to a block structured code with two distributions on recent Nvidia GPUs [36] and to full matrix codes. In terms of execution time, the sparse matrix EsoTwist method was found to obtain 99.4% of the performance of the AB-pattern on a recent GPU.…”

Section: Discussionsupporting

confidence: 52%

“…For the larger pipe oriented along the x-axis, we obtain 738.5 MNUPS (59.1% peak bandwidth); for the larger pipe oriented along the y-axis, we obtain 729.8 MNUPS (58.4% peak bandwidth) and, for the z direction, we obtain 727.3 MNUPS (58.2% peak bandwidth). Our performance figures compare to between 407 and 684 MNUPS reported for the GeForce GTX Titan by Tomczak and Szafran [36] using a lattice Boltzmann method with nineteen speeds (D3Q19 lattice) in double precision in a tiling layout (similar to block structuring) and two data sets (AB-pattern), which corresponds to between 48% and 72.6% bandwidth (the numbers are taken directly from [36] as our performance model does not apply directly to their code). The relatively wide range in the performance of [36] is due to differences in the data locality in the different test cases.…”

Section: Isotropymentioning

confidence: 92%

See 1 more Smart Citation

Esoteric Twist: An Efficient in-Place Streaming Algorithmus for the Lattice Boltzmann Method on Massively Parallel Hardware

Geier

Schönherr

2017

Computation

View full text Add to dashboard Cite

Abstract:We present and analyze the Esoteric Twist algorithm for the Lattice Boltzmann Method. Esoteric Twist is a thread safe in-place streaming method that combines streaming and collision and requires only a single data set. Compared to other in-place streaming techniques, Esoteric Twist minimizes the memory footprint and the memory traffic when indirect addressing is used. Esoteric Twist is particularly suitable for the implementation of the Lattice Boltzmann Method on Graphic Processing Units.

show abstract

Section: Discussionsupporting

confidence: 52%

Section: Isotropymentioning

confidence: 92%

Esoteric Twist: An Efficient in-Place Streaming Algorithmus for the Lattice Boltzmann Method on Massively Parallel Hardware

Geier

Schönherr

2017

Computation

View full text Add to dashboard Cite

show abstract

“…A number of high-performance lattice Boltzmann codes have been developed by various groups, including Palabos [15], waLBerla [16], MUPHY [17], and HARVEY [18]. With the availability of application programming interfaces for general-purpose graphics processing units, there has been increasing interest in GPU implementations of the LBM [19], [20], [21], [22], [23], [14], [24]. These efforts address a variety of aspects including efficient data layouts [22], [14], indirect addressing solutions [19], [23], and multi-GPU implementations [25], [21], [17], [23], [26], [27].…”

Section: B Related Workmentioning

confidence: 99%

GPU Acceleration of the HemeLB Code for Lattice Boltzmann Simulations in Sparse Complex Geometries

et al. 2021

View full text Add to dashboard Cite

We present an implementation and scaling analysis of a GPU-accelerated kernel for HemeLB, a high-performance Lattice Boltzmann code for sparse complex geometries. We describe the structure of the GPU implementation and we study the scalability of HemeLB on a GPU cluster under normal operating conditions and with real-world application cases. We investigate the effect of CUDA block size and GPU over-subscription on the single-GPU performance, and we present a strong-scaling analysis of multi-GPU parallel simulations using two different hardware models (P100 and V100) and a variety of large cerebral aneurysm geometries. We find that HemeLB-GPU achieves single-GPU speedups of 60x (P100) and 120x (V100) compared to a single CPU core, with good scalability up to 32 GPUs. We also discuss strategies to improve both the kernel performance as well as the scalability of HemeLB-GPU to a larger number of GPUs. The GPU implementation supports the LBGK collision kernel, boundary conditions for walls and inlets/outlets, and several lattice types (D3Q15, D3Q19, D3Q27), and it integrates seamlessly with the existing infrastructure in HemeLB for graph partitioning and parallelization via MPI. It is expected that the GPU implementation will enable users of the HemeLB code to make better utilization of heterogeneous high-performance computing systems for large-scale lattice Boltzmann simulations.

show abstract

“…A number of works on the parallel LBM have been carried out on multi-CPUs 21,22 and multi-GPUs. [23][24][25][26][27][28] However, all of them were conducted on uniform Cartesian grids. In this paper, a parallel 3D FV-LBM on the unstructured grids is developed and analyzed on the Tianhe-2A supercomputer at the National Supercomputing Center in Guangzhou, China.…”

Section: Introductionmentioning

confidence: 99%

A scalable parallel unstructured finite volume lattice Boltzmann method for three‐dimensional incompressible flow simulations

Chen

2021

Numerical Methods in Fluids

View full text Add to dashboard Cite

The standard lattice Boltzmann method, which employs certain regular lattices coupled with discrete velocities as the computational grid, is limited in its flexibility to simulate flows in irregular geometries. To simulate large-scale complex flows, we present a cell-centered finite volume lattice Boltzmann method for incompressible flows on three-dimensional (3D) unstructured grids and its corresponding parallel algorithm. The advective fluxes are calculated by the low-diffusion Roe scheme, and the gradients of the particle distribution functions are computed with a least squares method. The presented scheme is validated by three benchmark flows: (a) a 3D Poiseuille flow, (b) cubic cavity flows with Reynolds numbers Re = 100 and 400, and (c) flows past a sphere with Re = 50, 100, 150, 200, and 250. Some parallel performance results are presented to show the scalability of the method, which reveal that the proposed parallel algorithm has considerable scalability and that the parallel efficiency is higher than 87% on 3840 processor cores. It can be seen that the presented parallel solver has significant potential for the accurate simulation of flows in complex 3D geometries.

show abstract

A new GPU implementation for lattice-Boltzmann simulations on sparse geometries

Cited by 22 publications

References 23 publications

Esoteric Twist: An Efficient in-Place Streaming Algorithmus for the Lattice Boltzmann Method on Massively Parallel Hardware

Esoteric Twist: An Efficient in-Place Streaming Algorithmus for the Lattice Boltzmann Method on Massively Parallel Hardware

GPU Acceleration of the HemeLB Code for Lattice Boltzmann Simulations in Sparse Complex Geometries

A scalable parallel unstructured finite volume lattice Boltzmann method for three‐dimensional incompressible flow simulations

Contact Info

Product

Resources

About