An Efficient GPU-Based Out-of-Core LU Solver of Parallel Higher-Order Method of Moments for Solving Airborne Array Problems

Lin, Zhifang; Chen, Yan; Zhang, Yu; Zhao, Xunwang; Zhang, Huan‐Huan

doi:10.1155/2017/4309381

Cited by 3 publications

(3 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An efficient strategy for the parallelization of the MLFMA is introduced in [34], using single-precision calculations to obtain a speedup of up to 98 with respect to the sequential version. In [33] a direct solver (LU decomposition) is applied to MoM obtaining a speedup of 38 relative also to the sequential version of the algorithm.…”

Section: A Acceleration Of the Fe-iiee Methods And Related Workmentioning

confidence: 99%

See 1 more Smart Citation

GPU Acceleration of a Non-Standard Finite Element Mesh Truncation Technique for Electromagnetics

et al. 2020

View full text Add to dashboard Cite

The emergence of General Purpose Graphics Processing Units (GPGPUs) provides new opportunities to accelerate applications involving a large number of regular computations. However, properly leveraging the computational resources of graphical processors is a very challenging task. In this paper, we use this kind of device to parallelize FE-IIEE (Finite Element-Iterative Integral Equation Evaluation), a non-standard finite element mesh truncation technique introduced by two of the authors. This application is computationally very demanding due to the amount, size and complexity of the data involved in the procedure. Besides, an efficient implementation becomes even more difficult if the parallelization has to maintain the complex workflow of the original code. The proposed implementation using CUDA applies different optimization techniques to improve performance. These include leveraging the fastest memories of the GPU and increasing the granularity of the computations to reduce the impact of memory access. We have applied our parallel algorithm to two real radiation and scattering problems demonstrating speedups higher than 140 on a state-of-the-art GPU.

show abstract

Section: A Acceleration Of the Fe-iiee Methods And Related Workmentioning

confidence: 99%

“…5]. This is a common practice in the literature, e.g., [29], [30], [33], [34]. This analysis assesses the quality of the CUDA parallelization on a GPU and the performance obtained from a manycore architecture.…”

Section: A Antenna Problemmentioning

confidence: 99%

GPU Acceleration of a Non-Standard Finite Element Mesh Truncation Technique for Electromagnetics

et al. 2020

View full text Add to dashboard Cite

show abstract

“…[1] which performs a tiled Cholesky factorisation -in this case there is enough computational work per byte uploaded. Similar streaming techniques are used for computationally intensive algorithms in [11], and there are applications in visualisation as well [20,27].…”

Section: Related Workmentioning

confidence: 99%

Beyond 16GB

Reguly

Mudalige

Giles

2017

Proceedings of the Workshop on Memory Centric Programming for HPC

View full text Add to dashboard Cite

Stencil computations are a key class of applications, widely used in the scienti c computing community, and a class that has particularly bene ted from performance improvements on architectures with high memory bandwidth. Unfortunately, such architectures come with a limited amount of fast memory, which is limiting the size of the problems that can be e ciently solved. In this paper, we address this challenge by applying the well-known cache-blocking tiling technique to large scale stencil codes implemented using the OPS domain speci c language, such as CloverLeaf 2D, CloverLeaf 3D, and OpenSBLI. We introduce a number of techniques and optimisations to help manage data resident in fast memory, and minimise data movement. Evaluating our work on Intel's Knights Landing Platform as well as NVIDIA P100 GPUs, we demonstrate that it is possible to solve 3 times larger problems than the on-chip memory size with at most 15% loss in e ciency. ACM Reference format:

show abstract

Integrated Analysis and Optimization of the Large Airborne Radome-Enclosed Antenna System

Zhai

Zhao

Lin

2020

ACES

View full text Add to dashboard Cite

In order to realize integrally analysis and optimization of the large airborne radome-enclosed antenna system, a novel optimization strategy is proposed based on an overlapping domain decomposition method by using higher-order MoM and out-of-core solver (HO-OC-DDM), and combining with adaptive mutation particle swarm optimization (AMPSO). The introduction of parallel out-of-core solver and DDM can effectively break the random access memory (RAM) limit. This strategy can decompose difficult-to-solve global optimization problems into multi-domain optimization problems by using domain decomposition method. Finally, take airborne Yagi antenna system as an example, the numerical results show that the design of large airborne radome-enclosed antenna system based on the proposed strategy is convenient and effective.

show abstract

An Efficient GPU-Based Out-of-Core LU Solver of Parallel Higher-Order Method of Moments for Solving Airborne Array Problems

Cited by 3 publications

References 31 publications

GPU Acceleration of a Non-Standard Finite Element Mesh Truncation Technique for Electromagnetics

GPU Acceleration of a Non-Standard Finite Element Mesh Truncation Technique for Electromagnetics

Beyond 16GB

Integrated Analysis and Optimization of the Large Airborne Radome-Enclosed Antenna System

Contact Info

Product

Resources

About