waLBerla is a massively parallel software framework for simulating complex flows with the lattice Boltzmann method (LBM). Performance and scalability results are presented for SuperMUC, the world's fastest x86-based supercomputer ranked number 6 on the Top500 list, and JUQUEEN, a Blue Gene/Q system ranked as number 5.We reach resolutions with more than one trillion cells and perform up to 1.93 trillion cell updates per second using 1.8 million threads. The design and implementation of waLBerla is driven by a careful analysis of the performance on current petascale supercomputers. Our fully distributed data structures and algorithms allow for efficient, massively parallel simulations on these machines. Elaborate node level optimizations and vectorization using SIMD instructions result in highly optimized compute kernels for the single-and two-relaxation-time LBM. Excellent weak and strong scaling is achieved for a complex vascular geometry of the human coronary tree.
The lattice Boltzmann method exhibits excellent scalability on current supercomputing systems and has thus increasingly become an alternative method for large-scale non-stationary flow simulations, reaching up to a trillion (10 12 ) grid nodes. Additionally, grid refinement can lead to substantial savings in memory and compute time. These saving, however, come at the cost of much more complex data structures and algorithms. In particular, the interface between subdomains with different grid sizes must receive special treatment. In this article, we present parallel algorithms, distributed data structures, and communication routines that are implemented in the software framework waLBerla in order to support large-scale, massively parallel lattice Boltzmannbased simulations on non-uniform grids. Additionally, we evaluate the performance of our approach on two current petascale supercomputers. On an IBM Blue Gene/Q system, the largest weak scaling benchmarks with refined grids are executed with almost two million threads, demonstrating not only near-perfect scalability but also an absolute performance of close to a trillion lattice Boltzmann cell updates per second. On an Intel-based system, the strong scaling of a simulation with refined grids and a total of more than 8.5 million cells is demonstrated to reach a performance of less than one millisecond per time step. This enables simulations with complex, non-uniform grids and four million time steps per hour compute time.
Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with accelerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks for developing simulations on block-structured grids. The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms. We present several example applications realized with waLBerla, ranging from lattice Boltzmann methods to rigid particle simulations. Most importantly, these methods can be coupled together, enabling multiphysics simulations. The framework uses meta-programming techniques to generate highly efficient code for CPUs and GPUs from a symbolic method formulation. To ensure software quality and performance portability, a continuous integration toolchain automatically runs an extensive test suite encompassing multiple compilers, hardware architectures, and software configurations.
The motion of ionic solutes and charged particles under the influence of an electric field and the ensuing hydrodynamic flow of the underlying solvent is ubiquitous in aqueous colloidal suspensions. The physics of such systems is described by a coupled set of differential equations, along with boundary conditions, collectively referred to as the electrokinetic equations. Capuani et al. [J. Chem. Phys. 121, 973 (2004)] introduced a lattice-based method for solving this system of equations, which builds upon the lattice Boltzmann algorithm for the simulation of hydrodynamic flow and exploits computational locality. However, thus far, a description of how to incorporate moving boundary conditions into the Capuani scheme has been lacking. Moving boundary conditions are needed to simulate multiple arbitrarily moving colloids. In this paper, we detail how to introduce such a particle coupling scheme, based on an analogue to the moving boundary method for the pure lattice Boltzmann solver. The key ingredients in our method are mass and charge conservation for the solute species and a partial-volume smoothing of the solute fluxes to minimize discretization artifacts. We demonstrate our algorithm's effectiveness by simulating the electrophoresis of charged spheres in an external field; for a single sphere we compare to the equivalent electro-osmotic (co-moving) problem. Our method's efficiency and ease of implementation should prove beneficial to future simulations of the dynamics in a wide range of complex nanoscopic and colloidal systems that were previously inaccessible to lattice-based continuum algorithms.
In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed processor memory remains bounded independent of the processor number. In all stages of the AMR process, we use only distributed algorithms. No central resources such as a master process or replicated data are employed, so that an unlimited scalability can be achieved. For the dynamic load balancing in particular, we propose to exploit the hierarchical nature of the block-structured domain partitioning by creating a lightweight, temporary copy of the core data structure. This copy acts as a local and fully distributed proxy data structure. It does not contain simulation data, but only provides topological information about the domain partitioning into blocks. Ultimately, this approach enables an inexpensive, local, diffusion-based dynamic load balancing scheme.We demonstrate the excellent performance and the full scalability of our new AMR implementation for two architecturally different petascale supercomputers. Benchmarks on an IBM Blue Gene/Q system with a mesh containing 3.7 trillion unknowns distributed to 458,752 processes confirm the applicability for future extreme-scale parallel machines. The algorithms proposed in this article operate on blocks that result from the domain partitioning. This concept and its realization support the storage of arbitrary data. In consequence, the software framework can be used for different simulation methods, including mesh-based and meshless methods. In this article, we demonstrate fluid simulations based on the lattice Boltzmann method.1.2. Related Work. Software frameworks for block-structured adaptive mesh refinement (SAMR) have been available for the last three decades. Recently, many SAMR codes have been compared in terms of their design, capabilities, and limitations in [22]. All codes covered in this survey can run on large-scale parallel systems, are written in C/C ++ or Fortran, and are publicly available. Moreover, almost all EXTREME-SCALE BLOCK-STRUCTURED ADAPTIVE MESH REFINEMENT 3 these software packages can, among other approaches, make use of space filling curves (SFCs) during load balancing. Some of the SAMR codes are focused on specific applications and methods, while others are more generic and provide the building blocks for a larger variety of computational models. The codes also differ in the extent to which their underlying data structures require the redundant replication and synchronization of meta data among all processes. Meta data that increases with the size of the simulation is often an issue on large-scale parallel systems, and eliminating this need for global meta data replication is a challenge that all SAMR codes are facing.Both BoxLib [9] and Chombo [1], with Chombo being a fork of BoxLib that started in 1998, are general SAMR frameworks that are not tied to a specific application. Both, however, rely on a patc...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.