A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations

Griebel, Michael; Zaspel, Peter

doi:10.1007/s00450-010-0111-7

Cited by 64 publications

(40 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As mentioned in [19], in numerous publications researchers neglected including CPU-GPU data transfer into their time measurements, or neglected tuning the CPU program version and leveraging multithreading on a multicore CPU architecture for reasonable speedups. For instance, results are just compared to a single CPU in [7], while [19] and [5] look at the real performance of conventional multicore platforms. We support the latter approach by investigating an additional OpenMP implementation that uses all available cores on the platforms.…”

Section: Related Workmentioning

confidence: 99%

“…Here, we examine a particular module of KegelSpan and focus on the comparison of a variety of parallel programming models for GPGPU and multicore CPUs on different types of hardware platforms. Current GPU implementations (such as [6,7]) concentrate on the CUDA programming paradigm for GPGPUs. But other programming models for GPGPU have emerged and disregarding their existence without specific reason is not longer justified.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Simulation of bevel gear cutting with GPGPUs—performance and productivity

et al. 2011

View full text Add to dashboard Cite

The desire for general purpose computation on graphics processing units caused the advance of new programming paradigms, e.g. OpenCL C/C++, CUDA C or the PGI Accelerator Model. In this paper, we apply these programming approaches to the software KegelSpan for simulating bevel gear cutting. This engineering application simulates an important manufacturing process in the automotive industry. The results obtained are compared to an OpenMP implementation on various hardware configurations. The discussion covers performance results, but also productivity of code development realized in this effort.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Simulation of bevel gear cutting with GPGPUs—performance and productivity

et al. 2011

View full text Add to dashboard Cite

show abstract

“…A test case is the solution for the steady flow around NACA0012 airfoil at a Mach number of 0.3, the Reynolds number is 6 1.86 10  , the angle of attack is 3.59  . Fig.…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…Jespersen et al 5 accelerated Jacobi iteration section from the CFD code OVERFLOW using a GPU and shown a speedup by a factor between 2.5 and 3 compared to a single CPU. Griebel et al 6 implemented and optimized a two-phase solver for the Navier-Stokes equations using the Runge-Kutta time integration on a multi-GPU platform and achieved an impressive speedup of 69.6 on eight GPUs/CPUs. Jacobsen et al 7 utilized the MPI-CUDA programming pattern to implement a Jacobi iterative solver for the incompressible Navier-Stokes equations on the Lincoln GPU cluster with 128 GPUs and obtained a speedup of 130 over the CPU solution using Pthreads on two quad-core Intel Xeon processors.…”

Section: Introductionmentioning

confidence: 99%

Cpu/Gpu Computing for an Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

Deng

Bai

Wang

et al. 2016

Int. J. Mod. Phys. Conf. Ser.

View full text Add to dashboard Cite

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely "one-thread-one-point" and "one-thread-one-line", to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a trilevel hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

show abstract

“…There is a need for GPU clusters and multi-GPU parallel CFD models to study turbulent flows that are common in engineering practice. For the near future, dual-level parallelism that interleaves CUDA with Message Passing Interface (MPI) appear to be an adequate choice to address multi-GPU parallelism 10,15,16 . Jacobsen and Senocak 23 investigated trilevel parallelism using CUDA, MPI, and OpenMP for clusters with multiple GPUs per node.…”

Section: Introductionmentioning

confidence: 99%

GPU-accelerated Large-Eddy Simulation of Turbulent Channel Flows

DeLeon

Şenocak

2012

50th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition

View full text Add to dashboard Cite

High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to include a large-eddy simulation (LES) capability. In particular, we implement the Lagrangian dynamic subgrid scale model and compare our results against existing direct numerical simulation (DNS) data of a turbulent channel flow at Reτ = 180. Overall, our LES results match fairly well with the DNS data. Our results show that the Reτ = 180 case can be entirely simulated on a single GPU, whereas higher Reynolds cases can benefit from a GPU cluster.

show abstract

A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations

Cited by 64 publications

References 15 publications

Simulation of bevel gear cutting with GPGPUs—performance and productivity

Simulation of bevel gear cutting with GPGPUs—performance and productivity

Cpu/Gpu Computing for an Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

GPU-accelerated Large-Eddy Simulation of Turbulent Channel Flows

Contact Info

Product

Resources

About