CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows

Thibault, Julien C.; Şenocak, İnanç

doi:10.2514/6.2009-758

Cited by 144 publications

(79 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Adding a dynamic sub-grid model and implementing wall functions in LBM which will be done in the future should capture the wall boundary layers more accurately. Since our algorithm is mainly dominated by GPU memory bandwidth (Thibault and Senocak, 2009) adding more computations will not degrade the real-time capability of our method. Furthermore implementation of nonuniform lattice and extending the algorithm to multiple-GPU platform will enable real-time simulation of indoor environments with complicated geometry and large domain sizes.…”

Section: Discussionmentioning

confidence: 99%

“…We apply an interactive and real-time LBM CFD model with an integrated visualisation tool developed in (Delbosc et al, 2014) to evaluate the suitability, accuracy and usefulness of a 3D LBM based real-time, thermal and turbulent air flow solver running on a GPU platform. The implementation of LBM on the GPU is not unique in the sense that traditional CFD based methods could also be implemented on the GPU (Thibault and Senocak, 2009). But due to the local nature of the LBM algorithm along with the absence of any non-local Poisson pressure loop lends itself to be easily parallelisable compared to traditional CFD methodology on GPUs (Delbosc et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Real-time flow simulation of indoor environments using lattice Boltzmann method

et al. 2015

View full text Add to dashboard Cite

A novel lattice Boltzmann method (LBM) -based 3D computational fluid dynamics (CFD) technique has been implemented on the graphics processing unit (GPU) for the purpose of simulating the indoor environment in real-time . We study the time evolution of the turbulent air flow and temperature inside a test chamber and in a simple model of a four-bed hospital room. The predicted results from LBM are compared with traditional CFD-based large eddy simulations (LES). Reasonable agreement between LBM results and LES method are observed with significantly faster computational times.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Real-time flow simulation of indoor environments using lattice Boltzmann method

et al. 2015

View full text Add to dashboard Cite

show abstract

“…To list a few of the CUDAaccelerated CFD applications, Elsen et al 11 reported a 3D high-order FDM solver for large calculation on multi-block structured grids; Klöckner et al 16 developed a 3D unstructured high-order nodal DGM solver for the Maxwell's equations; Corrigan et al 10 proposed a 3D FVM solver for compressible inviscid flows on unstructured tetrahedral grids; Zimmerman et al 29 presented an SDM solver for the Navier-Stokes equations on unstructured hexahedral grids; and more as in the references. 3,4,12,7,21,23,13,19,8,14,1,9 However applying CUDA to a legacy CFD code is not likely an easy job since the developer has to define an explicit layout of the threads on the GPU (numbers of blocks, numbers of threads) for each kernel function. 15 So what if the CFD code designers have to meet specific investment requirements like (1) enable GPU computing for legacy CFD programs at a minimum extra cost in time and effort (usually a major concern for large-scale code development), (2) enable the GPU-accelerated programs running on different platforms (similar to the situation that the video game designers would like to make their products available across platforms)?…”

Section: Introductionmentioning

confidence: 99%

OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method

Xia

Luo

et al. 2014

52nd Aerospace Sciences Meeting

View full text Add to dashboard Cite

A GPU-accelerated discontinuous Galerkin (DG) method is presented for the solution of compressible flows on 3-D unstructured grids. The present work has employed two of the most attractive features in a new programming standard of parallel computing -OpenACC: 1) multi-platform/compiler support and 2) descriptive directive interface to upgrade a legacy CFD solver with the capacity of GPU computing, without significant extra cost in recoding, resulting in a highly portable and extensible GPU-accelerated code. In addition, a face renumbering/grouping scheme is proposed to overcome the "race condition" in facebased flux calculations that occurs on GPU vectorization. Performance of the developed double-precision solver is assessed for both simple and complex geometries. Speedup factors up to but not limited to 24× and 1.6× were achieved by comparing the measured computing time of the OpenACC program running on an NVIDIA Tesla K20c GPU to that of the equivalent MPI program running on one single core and full sixteen cores of an AMD Opteron-6128 CPU respectively, indicating a great potential to port more features of the underlying DG solver into the OpenACC framework.

show abstract

“…The use of GPUs for Euler solvers and incompressible NavierStokes solvers has been well documented. [10][11][12][13][14][15][16] Thibault and Senocak 15 developed a single-node multi-GPU 3D incompressible Navier-Stokes solver with a Pthreads-CUDA implementation that targets multi-GPU desktop platforms. This work was extended in Jacobsen et al 16 where an MPI-CUDA implementation was presented and assessed on the NCSA Lincoln Tesla Cluster.…”

Section: Introductionmentioning

confidence: 99%

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Jacobsen

Şenocak

2011

49th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition

Self Cite

View full text Add to dashboard Cite

High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP's fine-grain parallelism more effectively.

show abstract

CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows

Cited by 144 publications

References 17 publications

Real-time flow simulation of indoor environments using lattice Boltzmann method

Real-time flow simulation of indoor environments using lattice Boltzmann method

OpenACC-based GPU Acceleration of a 3-D Unstructured Discontinuous Galerkin Method

Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Contact Info

Product

Resources

About