Abstract. This paper presents a GPU computing algorithm, used to accelerate the Continuous-based Discrete Element Method (CDEM). Using a NVIDIA GTX VGA card, the computing speed achieved an average 650 times speedup ratio vs. Intel Core-Dual 2.66 GHz CPU. To parallelize the CDEM algorithm, the clone node force refreshing process is separated from the elemental calculation, and is replaced by a "Node Group" force assignment process, which ensures the data independence in parallel execution.
In this paper, we propose a method for joint 2D segmentation and 2D-3D pose tracking. First, we define a novel energy functional which considers the discrimination between statistical appearance models and the coherence among neighboring pixels simultaneously. And then, a particle filter-like stochastic optimization technique is adopted to solve the energy functional, so that a preferable initial value can be provided for the subsequent damped Newton optimization method. Furthermore, an occlusion-aware updating strategy is utilized for appearance models, which can easily increase the foreground learning rate. As a result, our method is more suitable for the video sequences with occlusion. Experimental results highlight excellent performance on challenging synthetic and real-world sequences as compared with the state-of-the-art approaches.
Abstract. GPUs are high performance co-processors of CPU for scientific computing including CFD. We present an optimistic shared memory allocation strategy to solve 2D CFD problems using Red-Black SOR method on GPU with CUDA (Compute Unified Device Architecture). Lid-driven results are compared with the benchmark data. The speed up ratio of same problem size by using NVDIA GTX480 and Intel Core-Dual 3.0GHz processor is discussed, the performance of GPU is 120 times faster than the sequential code on CPU with the problem size of 756 × 756. Based on this work, we conclude that using the memory hierarchy properly has a key role in improving the computational performance of GPU. IntroductionSimulating CFD problems efficiently and accurately is of great importance for scientific computing and engineering applications. GPUs that are originally designed for graphics rendering have become massively-parallel "co-processors" of the Central Processing Unit (CPU). In recent years, GPU technology develops quickly and modern GPU can provide memory bandwidth and floating-point performance that are orders of magnitude faster than a standard CPU [1]. Researchers in CFD field have done a lot of work in parallel computing algorithms and applications on GPU and gotten great achievements. In the aspect of algorithms of CFD, Senocak [2] presented a 3D Navier-Stokes solver on GPU for incompressible flows using Jacobi iteration method, Serban Georgescu [3] developed a Conjugate Gradient solver for 3D Poisson's equation on GPU and reported up to 22 times acceleration when using three GPUs compared with CPU, Jonathan M. Cohen [4] implemented 3D Boussinesq code with Red-Black Gauss-Seidel on GPU and got an acceleration of up to 8 times faster than a CPU.Red-Black SOR which is a high efficiency, yielding simple, inexpensive and fully parallelizable method [5] is widely used in parallel computing both on CPU and GPU. Chih-Wei Hsieh [6] implemented Red Black method for solving 2D parabolic partial differential equations on GPU was 11 times faster compared with CPU with the problem size of 400x400, Sheng-Hsiu Kuo [7] solved 2D nonlinear Burgers' equation by using Red-Black SOR method on GPU and got a speed-up ratio of 12 times at mesh size 1026×1026 on GPU compared with CPU, Jonathan M. Cohen [4] and Aaron F. Shinn [8] implemented the Red-Black SOR iteration method to solve 3D CFD problems on GPU with multi-grid relaxation schemes and achieved speed up ratio of 8 times and 15times respectively. As a highly parallel computational method, Red-Black SOR method is suitable for GPU computing and can achieve a high speed up ratio if we use the memory hierarchy properly and allocate memory efficiently according to our experience.The rest of this paper is organized as follows. Section 2 introduces the GPU hardware architecture and CUDA programming model. Section 3 briefly shows the governing equation and numerical method of incompressible fluid flows. The acceleration strategy of solving CFD problems on GPU is described in section 4 and the res...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.