2011
DOI: 10.1016/j.parco.2011.03.005
|View full text |Cite
|
Sign up to set email alerts
|

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU–CPU clusters

Abstract: Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in deta… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(28 citation statements)
references
References 18 publications
0
28
0
Order By: Relevance
“…Simple in its design, the scenario is yet representative for a large class of problems regarding performance considerations for LBM simulations on regular Cartesian grids. It is further used as a benchmark in a variety of LBM performance evaluations and comparisons [7,9,10,14]. A relaxation time τ = 0.6152 ∈ (0.5, 2) is applied and all test runs take 1024 timesteps, thus, 512 α-and β-steps are executed.…”
Section: Resultsmentioning
confidence: 99%
“…Simple in its design, the scenario is yet representative for a large class of problems regarding performance considerations for LBM simulations on regular Cartesian grids. It is further used as a benchmark in a variety of LBM performance evaluations and comparisons [7,9,10,14]. A relaxation time τ = 0.6152 ∈ (0.5, 2) is applied and all test runs take 1024 timesteps, thus, 512 α-and β-steps are executed.…”
Section: Resultsmentioning
confidence: 99%
“…We are not the first group setting up such a solver. However, there are a few major differences to other approaches: [55] and [56] both implement the lattice-Boltzmann method on top of a highly optimised octree grid implementation. I.e., the solver itself had to be developed more or less from scratch in a way that is adapted to the grid structure and its specific memory-optimal traversal in the Peano framework [57] or waLBerla [58].…”
Section: Discussionmentioning
confidence: 99%
“…I.e., the solver itself had to be developed more or less from scratch in a way that is adapted to the grid structure and its specific memory-optimal traversal in the Peano framework [57] or waLBerla [58]. Also, [56] provides a solver on statically adaptive meshes, which allows for much more sophisticated solutions in terms of run-time optimisation per core and on massively parallel hardware. In contrast, we focus on full dynamical adaptivity as required by the physical systems simulated with ESPResSo, where regions with high refinement requirements move with particles or molecules immersed in the flow.…”
Section: Discussionmentioning
confidence: 99%
“…One of them is data structure design which can affect the speed of memory access. Array of Structure(AoS) [19] is chosen on multi-core CPU, which means that the particle distribution functions (PDFs) of each cell are stored adjacent in memory. Whereas, for the Structure-ofArrays(SoA) layout used on GPU, the PDFs pointing in the same lattice direction are adjacent in global memory.…”
Section: Data Structure Design and Memory Access Modelmentioning
confidence: 99%
“…In the case of LBM implementation for multi-GPU and heterogeneous CPU-GPU clusters, Wang et al [18] performed simulations of lid-driven cavity flow on a HPC system named Tsubame comprising 170 NVIDIA Tesla S1070 boxes. Christian et al [19] developed an approach for heterogeneous simulations on clusters equipped with varying node configurations, using WaLBerla framework. In terms of ELBM parallelization and optimization strategies, there are a few research works [20,21] in order to reduce the computational overhead, most of them are improved from the numerical calculation aspect.…”
Section: Introductionmentioning
confidence: 99%