Solving Parabolic Problems Using Multithread and GPU

Hsieh, Chih‐Wei; Kuo, Shanny Hsuan; Kuo, Fang-An; Chou, Chau-Yi

doi:10.1109/ispa.2010.48

Cited by 5 publications

(2 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Red-Black SOR which is a high efficiency, yielding simple, inexpensive and fully parallelizable method [5] is widely used in parallel computing both on CPU and GPU. Chih-Wei Hsieh [6] implemented Red Black method for solving 2D parabolic partial differential equations on GPU was 11 times faster compared with CPU with the problem size of 400x400, Sheng-Hsiu Kuo [7] solved 2D nonlinear Burgers' equation by using Red-Black SOR method on GPU and got a speed-up ratio of 12 times at mesh size 1026×1026 on GPU compared with CPU, Jonathan M. Cohen [4] and Aaron F. Shinn [8] implemented the Red-Black SOR iteration method to solve 3D CFD problems on GPU with multi-grid relaxation schemes and achieved speed up ratio of 8 times and 15times respectively. As a highly parallel computational method, Red-Black SOR method is suitable for GPU computing and can achieve a high speed up ratio if we use the memory hierarchy properly and allocate memory efficiently according to our experience.…”

Section: Introductionmentioning

confidence: 99%

A GPU Accelerated Red-Black SOR Algorithm for Computational Fluid Dynamics Problems

Liu

Song

et al. 2011

AMR

View full text Add to dashboard Cite

Abstract. GPUs are high performance co-processors of CPU for scientific computing including CFD. We present an optimistic shared memory allocation strategy to solve 2D CFD problems using Red-Black SOR method on GPU with CUDA (Compute Unified Device Architecture). Lid-driven results are compared with the benchmark data. The speed up ratio of same problem size by using NVDIA GTX480 and Intel Core-Dual 3.0GHz processor is discussed, the performance of GPU is 120 times faster than the sequential code on CPU with the problem size of 756 × 756. Based on this work, we conclude that using the memory hierarchy properly has a key role in improving the computational performance of GPU. IntroductionSimulating CFD problems efficiently and accurately is of great importance for scientific computing and engineering applications. GPUs that are originally designed for graphics rendering have become massively-parallel "co-processors" of the Central Processing Unit (CPU). In recent years, GPU technology develops quickly and modern GPU can provide memory bandwidth and floating-point performance that are orders of magnitude faster than a standard CPU [1]. Researchers in CFD field have done a lot of work in parallel computing algorithms and applications on GPU and gotten great achievements. In the aspect of algorithms of CFD, Senocak [2] presented a 3D Navier-Stokes solver on GPU for incompressible flows using Jacobi iteration method, Serban Georgescu [3] developed a Conjugate Gradient solver for 3D Poisson's equation on GPU and reported up to 22 times acceleration when using three GPUs compared with CPU, Jonathan M. Cohen [4] implemented 3D Boussinesq code with Red-Black Gauss-Seidel on GPU and got an acceleration of up to 8 times faster than a CPU.Red-Black SOR which is a high efficiency, yielding simple, inexpensive and fully parallelizable method [5] is widely used in parallel computing both on CPU and GPU. Chih-Wei Hsieh [6] implemented Red Black method for solving 2D parabolic partial differential equations on GPU was 11 times faster compared with CPU with the problem size of 400x400, Sheng-Hsiu Kuo [7] solved 2D nonlinear Burgers' equation by using Red-Black SOR method on GPU and got a speed-up ratio of 12 times at mesh size 1026×1026 on GPU compared with CPU, Jonathan M. Cohen [4] and Aaron F. Shinn [8] implemented the Red-Black SOR iteration method to solve 3D CFD problems on GPU with multi-grid relaxation schemes and achieved speed up ratio of 8 times and 15times respectively. As a highly parallel computational method, Red-Black SOR method is suitable for GPU computing and can achieve a high speed up ratio if we use the memory hierarchy properly and allocate memory efficiently according to our experience.The rest of this paper is organized as follows. Section 2 introduces the GPU hardware architecture and CUDA programming model. Section 3 briefly shows the governing equation and numerical method of incompressible fluid flows. The acceleration strategy of solving CFD problems on GPU is described in section 4 and the res...

show abstract

Section: Introductionmentioning

confidence: 99%

A GPU Accelerated Red-Black SOR Algorithm for Computational Fluid Dynamics Problems

Liu

Song

et al. 2011

AMR

View full text Add to dashboard Cite

show abstract

“…GPUs have also been used to accelerate the Gauss-Seidel method by making use of shading languages, such as Cg, before native GPU computing languages, such as CUDA, were devised [11].The SOR method can lead to even faster convergence. It has been applied to medical analysis [12] and to computational fluid dynamics [13][14][15], as this kind of problems require a large number of calculations to be performed. Red/black SOR has already been applied by Itu in solving the steady state heat conduction equation on GPUs by following optimization strategies recommended by the GPU vendors [16], such as memory padding and usage of shared memory.…”

mentioning

confidence: 99%

Graphics processing unit acceleration of the red/black SOR method

Konstantinidis

Cotronis

2012

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYThis work presents our strategy, applied optimizations and results in our effort to exploit the computational capabilities of graphics processing units (GPUs) under the CUDA environment in order to solve the Laplacian PDE. The parallelizable red/black successive over-relaxation (SOR) method was used. Additionally, a program for the CPU was developed as a performance reference. Various performance improvements were achieved by using optimization methods, which proved to provide significant speedup. Memory access patterns prove to be a critical factor in efficient program execution on GPUs and it is, therefore, appropriate to follow data reorganization to achieve the highest feasible memory throughput. The same approach exhibits performance benefits on the CPU version, as well. Eventually, a direct comparison of optimal versions' performance was realized. A 10 speedup was measured for the CUDA version on an NVidia GTX480 GPU (NVidia Corp, Sta. Clara, CA, USA), exceeding 142 GB/s bandwidth, over the single threaded CPU version when run on an Intel Core i7 2600K CPU. The results prove that the global memory cache added on recent GPU architectures assist achieving high performance without requiring to employ the special memory types provided by the GPU (i.e. shared, texture or constant memory).

show abstract