The granularity of computational fluid dynamics (CFD) generally refers to the point granularity parallelization as a unit of the grid when graphics processing units (GPUs) are utilized as the computing carrier. In commonly deployed implicit time advancement schemes, the parallel dimensionality must be reduced, resulting in the time advancement procedure becoming the only highly time-consuming step in the whole CFD computing procedures. In this paper, a block data-parallel lower-upper relaxation (BDPLUR) scheme based on Jacobi iteration and Roe's flux scheme is proposed and then implemented on a GPU. Numerical experiments are carried out and show that the convergence speed of the BDPLUR scheme, especially when implemented on a GPU, is approximately 10 times higher than that of the original data-parallel lower-upper relaxation (DPLUR) scheme and more than 100 times higher than that of the lower-upper symmetric Gauss‒Seidel (LUSGS) scheme. Moreover, the influence of different Courant-Friedrichs-Lewy (CFL) numbers on the convergence time is discussed, and different viscous matrices are compared. Standard cases are adopted to verify the effectiveness of the BDPLUR scheme.
Many studies have focused on the acceleration of computational fluid dynamics (CFD) using multicore hardware such as graphics processing units (GPUs) in the field of parallel computing. In GPU acceleration, CFD parallel granularity generally refers to the point granularity parallelization as a unit of the grid. An implicit time advancement scheme is more efficient and faster than an explicit time advancement scheme for CFD. However, for commonly used implicit schemes such as the lower-upper symmetric Gauss‒Seidel (LUSGS) scheme, the parallel dimensionality is reduced, resulting in a highly time-consuming procedure. In this paper, the data-parallel upper-lower relaxation (DPLUR) scheme based on Jacobi iteration is used and then implemented on a GPU. Numerical experiments are carried out and show that the computing speed of point granularity parallelization using the DPLUR scheme, especially implemented on a GPU, is much higher than that of dimensionality reduction using the LUSGS scheme. Moreover, the influence of different Jacobi inner iteration steps (JIIS) on the convergence time is discussed, and two JIIS optimization algorithms are proposed according to the characteristics of convergence. On the basis of the memory access form, a DPLUR red‒black (DPRB) scheme with more stable and faster convergence than the conventional DPLUR scheme is developed. Finally, some standard cases are adopted to verify the effectiveness of DPRB schemes with the JIIS optimization algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.