In this paper, a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on graphics processing unit (GPU) heterogeneous platform, noted as GPU-PMLFMA, is presented for solving extremely large electromagnetic scattering problems involving tens of billions of unknowns, In this approach, the flexible and efficient ternary partitioning scheme is employed at first to partition the MLFMA octree among message passing interface (MPI) processes. Then the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation, etc., are accelerated by using the GPU. Different parallelization strategies in coincidence with the ternary parallel MLFMA approach are designed for GPU to ensures a high computational throughput. Special memory usage strategy is designed to improve the computational efficiency and benefit data re-using. The CPU/GPU asynchronous computing pattern is designed with the OpenMP and CUDA respectively for accelerating the CPU and GPU execution parts and computation time overlapped. GPU architecture-based optimization strategies are implemented to further improve the computational efficiency. Numerical results demonstrate that the proposed GPU-PMLFMA can achieve over 3 times speed-up, compared with the 8-threaded conventional PMLFMA. Solutions of scattering by electrically large and complicated objects with about 24000 wavelengths and over 41.8 billion unknowns, are presented.
Since the first working multilevel fast multipole algorithm (MLFMA) for electromagnetic simulations was proposed by Chew's group in 1995, this algorithm has been recognized as one of the most powerful tools for numerical solutions of extremely large electromagnetic problems with complex geometries. It has been parallelized with different strategies to explore the computing power of supercomputers, increasing the size of solvable problems from millions to tens of billions of unknowns, thereby addressing the crucial demand arising from practical applications in a sense. This paper provides a comprehensive review of state-of-the-art parallel approaches of the MLFMA, especially on a newly proposed ternary parallelization scheme and its acceleration on graphics processing unit (GPU) clusters. We discuss and numerically study the advantages of the ternary parallelization scheme and demonstrate its flexibility and efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.