The lattice Boltzmann method has been fully discretized in space, time, and velocity; its inherent parallelism makes it outstanding for use in accelerated computation by graphics processing unit in large-scale simulations of fluid dynamics. When the lattice Boltzmann method is used to simulate a fluid system with complex geometry, the flow field is usually compressed to reduce memory consumption, and fluid nodes are accessed indirectly to improve computational efficiency. We designed a pointer array that is the same size as the flow field and is based on the Compute Unified Device Architecture platform's unified memory technology. The addresses of the fluid nodes are stored in this array, and the other nodes, which are unallocated, are marked as null. For obtaining the coordinates of the fluid nodes in the original flow field, we stored the addresses of the pointer array units whose values were not null as part of the lattice attribute at the end of the lattice attribute array, forming a cyclic pointer structure to track geometric information. We validated the feasibility of this addressing scheme using an experimental simulation of aqueous humor in the anterior segment of the eye, and tested its performance on the graphics processing unit of Pascal, Volta, and Turing architecture. The present method carefully distributes data to generate fewer memory transactions and to reduce access times of the global memory, thus achieving approximately 18% performance improvement. INDEX TERMS Addressing scheme, complex geometry, graphic processing unit (GPU), lattice Boltzmann method.
The chemical-potential multiphase lattice Boltzmann method (CP-LBM) has the advantages of satisfying the thermodynamic consistency and Galilean invariance, and it realizes a very large density ratio and easily expresses the surface wettability. Compared with the traditional central difference scheme, the CP-LBM uses the Thomas algorithm to calculate the differences in the multiphase simulations, which significantly improves the calculation accuracy but increases the calculation complexity. In this study, we designed and implemented a parallel algorithm for the chemical-potential model on a graphic processing unit (GPU). Several strategies were used to optimize the GPU algorithm, such as coalesced access, instruction throughput, thread organization, memory access, and loop unrolling. Compared with dual-Xeon 5117 CPU server, our methods achieved 95 times speedup on an NVIDIA RTX 2080Ti GPU and 106 times speedup on an NVIDIA Tesla P100 GPU. When the algorithm was extended to the environment with dual NVIDIA Tesla P100 GPUs, 189 times speedup was achieved and the workload of each GPU reached 96%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.