2017
DOI: 10.1155/2017/4309381
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient GPU-Based Out-of-Core LU Solver of Parallel Higher-Order Method of Moments for Solving Airborne Array Problems

Abstract: The parallel higher-order method of moments (HoMoM) with a GPU accelerated out-of-core LU solver is presented for analysis of radiation characteristics of a 1000-element antenna array over a full-size airplane. A parallel framework involving MPI and CUDA is adopted to ensure that the procedures run on a hybrid CPU/GPU cluster. An efficient two-level out-of-core scheme is designed to break the bottleneck of both GPU memory and physical memory when solving electrically large and complex problems. To hide communi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 31 publications
0
3
0
Order By: Relevance
“…An efficient strategy for the parallelization of the MLFMA is introduced in [34], using single-precision calculations to obtain a speedup of up to 98 with respect to the sequential version. In [33] a direct solver (LU decomposition) is applied to MoM obtaining a speedup of 38 relative also to the sequential version of the algorithm.…”
Section: A Acceleration Of the Fe-iiee Methods And Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…An efficient strategy for the parallelization of the MLFMA is introduced in [34], using single-precision calculations to obtain a speedup of up to 98 with respect to the sequential version. In [33] a direct solver (LU decomposition) is applied to MoM obtaining a speedup of 38 relative also to the sequential version of the algorithm.…”
Section: A Acceleration Of the Fe-iiee Methods And Related Workmentioning
confidence: 99%
“…5]. This is a common practice in the literature, e.g., [29], [30], [33], [34]. This analysis assesses the quality of the CUDA parallelization on a GPU and the performance obtained from a manycore architecture.…”
Section: A Antenna Problemmentioning
confidence: 99%
“…[1] which performs a tiled Cholesky factorisation -in this case there is enough computational work per byte uploaded. Similar streaming techniques are used for computationally intensive algorithms in [11], and there are applications in visualisation as well [20,27].…”
Section: Related Workmentioning
confidence: 99%