Algorithmic and architecture-oriented optimizations are essential for achieving performance worthy of anticipated energy-austere exascale systems. In this paper, we present an extreme scale FMM-accelerated boundary integral equation solver for wave scattering, which uses FMM as a matrix-vector multiplication inside the GMRES iterative method. Our FMM Helmholtz kernels are capable of treating nontrivial singular and near-field integration points. We implement highly optimized kernels for both shared and distributed memory, targeting emerging Intel extreme performance HPC architectures. We extract the potential thread-and data-level parallelism of the key Helmholtz kernels of FMM. Our application code is well optimized to exploit the AVX-512 SIMD units of Intel Skylake and Knights Landing architectures. We provide different performance models for tuning the task-based tree traversal implementation of FMM, and develop optimal architecturespecific and algorithm aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. With shared memory optimizations, we achieve roughly 77% of peak single precision floating point performance of a 56-core Skylake processor, and on average 60% of peak single precision floating point performance of a 72-core KNL. These numbers represent nearly 5.4x and 10x speedup on Skylake and KNL, respectively, compared to the the baseline scalar code. With distributed memory optimizations, on the other hand, we report near-optimal efficiency in the weak scalability study with respect to both the O(log P ) communication complexity as well as the theoretical scaling complexity of FMM. In addition, we exhibit up to 85% efficiency in strong scaling. We compute in excess of 2 billion DoF on the full-scale of the Cray XC40 supercomputer. The numerical results match the analytical solution with convergence at 1.0e-4 relative 2-norm residual accuracy. To the best of our knowledge, this work presents the fastest and the most scalable FMM-accelerated linear solver for oscillatory kernels.
A fully explicit marching-on-in-time (MOT) scheme for solving the time domain Kirchhoff (surface) integral equation to analyze transient acoustic scattering from rigid objects is presented. A higher-order Nyström method and a PE(CE)m-type ordinary differential equation integrator are used for spatial discretization and time marching, respectively. The resulting MOT scheme uses the same time step size as its implicit counterpart (which also uses Nyström method in space) without sacrificing from the accuracy and stability of the solution. Numerical results demonstrate the accuracy, efficiency, and applicability of the proposed explicit MOT solver.
An explicit marching-on-in-time (MOT) scheme to efficiently solve the time domain magnetic field integral equation (TD-MFIE) with a large time step size (under a low-frequency excitation) is developed. The proposed scheme spatially expands the current using high-order nodal functions defined on curvilinear triangles discretizing the scatterer surface. Applying Nyström discretization, which uses this expansion, to the TD-MFIE, which is written as an ordinary differential equation (ODE) by separating self-term contribution, yields a system of ODEs in unknown time-dependent expansion coefficients. A predictor-corrector method is used to integrate this system for samples of these coefficients. Since the Gram matrix arising from the Nyström discretization is blockdiagonal, the resulting MOT scheme replaces the matrix "inversion" required at each time step by a product of the inverse block-diagonal Gram matrix and the right-hand side vector. It is shown that, upon convergence of the corrector updates, this explicit MOT scheme produces the same solution as its implicit counterpart, and is faster for large time step sizes. Index Terms-Marching-on-in-time (MOT), magnetic field integral equation (MFIE), Nyström method, predictor-corrector scheme.
In this paper, a new single grit model between the workpiece and the single grit considering both cutting and ploughing effects is proposed to predict the material deformation and microgrinding forces. The proposed model predictions are compared to the experiment data of the Single Crystal Diamond (SCD) cutting for validation. Extension of the single grit model by stochastic distribution analysis to predict the entire microgrinding forces is also presented.
We design and develop a new high performance implementation of a fast direct LU-based solver using low-rank approximations on massively parallel systems. The LU factorization is the most timeconsuming step in solving systems of linear equations in the context of analyzing acoustic scattering from large 3D objects. The matrix equation is obtained by discretizing the boundary integral of the exterior Helmholtz problem using a higher-order Nyström scheme. The main idea is to exploit the inherent data sparsity of the matrix operator by performing local tilecentric approximations while still capturing the most significant information. In particular, the proposed LU-based solver leverages the Tile Low-Rank (TLR) data compression format as implemented in the Hierarchical Computations on Manycore Architectures (HiCMA) library to decrease the complexity of "classical" dense direct solvers from cubic to quadratic order. We taskify the underlying boundary integral kernels to expose fine-grained computations. We then employ the dynamic runtime system StarPU to orchestrate the scheduling of computational tasks on shared and distributed-memory systems. The resulting asynchronous execution permits to compensate for the load imbalance due to the heterogeneous ranks, while mitigating the overhead of data motion. We assess the robustness of our TLR LU-based solver and study the qualitative impact when using different numerical accuracies. The new TLR LU factorization outperforms the state-of-the-art dense factorizations by up to an order of magnitude on various parallel systems, for analysis of scattering from large-scale 3D synthetic and real geometries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.