Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures

Wang, Shen; Li, Xiaoye S.; Xia, Jianlin; Situ, Yingchong; Hoop, Maarten V. de

doi:10.1137/110848062

Cited by 53 publications

(59 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here, since the algorithm frequently involves skinny matrices, the process grids are not necessarily square. This is different from those in [28,29], where large local matrices are involved. In this setting, each process can only access a subset of nodes in the assembly tree.…”

Section: 1mentioning

confidence: 67%

See 1 more Smart Citation

A Distributed-Memory Randomized Structured Multifrontal Method for Sparse Direct Solutions

Xin¹,

Xia²,

Hoop³

et al. 2017

SIAM J. Sci. Comput.

Self Cite

View full text Add to dashboard Cite

Abstract. We design a distributed-memory randomized structured multifrontal solver for large sparse matrices. Two layers of hierarchical tree parallelism are used. A sequence of innovative parallel methods are developed for randomized structured frontal matrix operations, structured update matrix computation, skinny extend-add operation, selected entry extraction from structured matrices, etc. Several strategies are proposed to reuse computations and reduce communications. Unlike an earlier parallel structured multifrontal method that still involves large dense intermediate matrices, our parallel solver performs the major operations in terms of skinny matrices and fully structured forms. It thus significantly enhances the efficiency and scalability. Systematic communication cost analysis shows that the numbers of words are reduced by factors of about O( √ n/r) in two dimensions and about O(n 2/3 /r) in three dimensions, where n is the matrix size and r is an off-diagonal numerical rank bound of the intermediate frontal matrices. The efficiency and parallel performance are demonstrated with the solution of some large discretized PDEs in two and three dimensions. Nice scalability and significant savings in the cost and memory can be observed from the weak and strong scaling tests, especially for some 3D problems discretized on unstructured meshes.

show abstract

Section: 1mentioning

confidence: 67%

“…This can help reduce the number of messages exchanged and save the communication cost. See [28,29] for more discussions. Here, since the algorithm frequently involves skinny matrices, the process grids are not necessarily square.…”

Section: 1mentioning

confidence: 99%

A Distributed-Memory Randomized Structured Multifrontal Method for Sparse Direct Solutions

Xin¹,

Xia²,

Hoop³

et al. 2017

SIAM J. Sci. Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Further, a parallel hierarchical ACA algorithm demonstrating an acceleration factor larger than 200 was presented in [31]. This paper proposes a novel fast scalable HO parallel algorithm for large and complex scattering, radiation, and propagation problems in CEM based on the DHO MoM-SIE modeling in the frequency domain (FD) [22], [24], [32], [33] in conjunction with a direct solver for dense linear systems using HSS matrices [34], namely, the DHO HSS-MoM-SIE method. We are developing asymptotically fast HO direct algorithms for MoM-SIE solutions which, in a nutshell, are an algebraic generalization to FMMs.…”

Section: Efficient Scalable Parallel Higher Order Directmentioning

confidence: 99%

“…The HSS algorithm is shown to have excellent parallel scalability. Our work uses the recently developed new, state-of-the-art, algorithms for solving dense and sparse linear systems of equations based on the HSS algorithm [34]. The new HSS algorithm has been demonstrated to have a dramatic advantage in terms of time and space complexity (e.g., ∼70 times less memory for seismic imaging examples with matrix size 250 000 × 250 000) over the LU factorization algorithm, and to be extremely scalable.…”

Section: Efficient Scalable Parallel Higher Order Directmentioning

confidence: 99%

Efficient Scalable Parallel Higher Order Direct MoM-SIE Method With Hierarchically Semiseparable Structures for 3-D Scattering

Manic

Smull

Rouet

et al. 2017

IEEE Trans. Antennas Propagat.

View full text Add to dashboard Cite

Abstract-A novel fast scalable parallel algorithm is proposed for the solution of large 3-D scattering problems based on: 1) the double (geometrical and current-approximation) higher order (DHO) method of moments (MoM) in the surface integral equation (SIE) formulation and 2) a direct solver for dense linear systems utilizing hierarchically semiseparable (HSS) structures. Namely, an HSS matrix representation is used for compression, factorization, and solution of the system matrix. In addition, a rank-revealing QR decomposition for memory compression is used, with a stopping criterion in terms of the relative rank tolerance value. A method for geometrical preprocessing of the scatterers based on the cobblestone distance sorting technique is employed in order to enhance the HSS algorithm accuracy and parallelization. Numerical examples show how the accuracy of the DHO HSS-MoM-SIE method is easily controllable by using the relative tolerance for the matrix compression. Moreover, the examples demonstrate low memory consumption, as well as much faster simulation time, when compared to the direct LU decomposition. The method enables dramatically faster monostatic scattering computations than iterative solvers and reduced number of unknowns when compared to low-order discretizations. Finally, great scalability of the algorithm is demonstrated on more than one thousand processes.Index Terms-Curved parametric elements, direct solvers, fast solvers, hierarchically semiseparable (HSS) structures, higher order (HO) modeling, low-rank matrix approximation, method of moments (MoM), multilevel matrix compression, numerical algorithms, parallelization, polynomial basis functions, scalability, scattering, surface integral equation (SIE).

show abstract

“…4. We use the parameter triplets (n i , p i , r i ) where p = (4, 16, 64, 256, 1024, 4096), n = (2.5, 5, 10, 20, 40, 80) · 10 3 , and r = (5, 5, 5, 5, 6, 7), based on the parameters for parallel HSS performance tests in [10]. Note that for Grid we only use the first 3 parameter triplets since p max = 125.…”

Section: Performance Modellingmentioning

confidence: 99%

Exploiting Data Sparsity in Parallel Matrix Powers Computations

Knight¹,

Carson²,

Demmel³

2013

View full text Add to dashboard Cite

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. The increasingly high relative cost of moving data on modern parallel machines has caused a paradigm shift in the design of high-performance algorithms: to achieve e ciency, one must focus on strategies which minimize data movement, rather than minimize arithmetic operations. We call this a communication-avoiding approach to algorithm design. Copyright © 2013, by the author(s).All rights reserved.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. AcknowledgementWe acknowledge funding from Microsoft (award #024263) and Intel (award #024894), and matching funding by UC Discovery (award #DIG07-10227), with additional support from ParLab affiliates National Instruments, Nokia, NVIDIA, Oracle, and Samsung, and support from MathWorks. We also acknowledge the support of the US DOE (grants DE-SC0003959, DE-SC0004938, DE-SC0005136, DE-SC0008700, DE-AC02-05CH11231, DE-FC02-06ER25753, and DE-FC02-07ER25799) and DOD (DARPA award #HR0011-12-2-0016 and NDSEG fellowship 32 CFR 168a). Exploiting Data Sparsity in Parallel Matrix Powers ComputationsNicholas Knight, Erin Carson, James Demmel University of California, Berkeley {knight,ecc2z,demmel}@cs.berkeley.edu AbstractThe increasingly high relative cost of moving data on modern parallel machines has caused a paradigm shift in the design of high-performance algorithms: to achieve efficiency, one must focus on strategies which minimize data movement, rather than minimize arithmetic operations. We call this a communication-avoiding approach to algorithm design.In this work, we derive a new parallel communication-avoiding matrix powers algorithm for matrices of the form A = D +U SV H , where D is sparse and U SV H has low rank but may be dense. Matrices of this form arise in many practical applications, including power-law graph analysis, circuit simulation, and algorithms involving hierarchical (H) matrices, such as multigrid methods...

show abstract

Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures

Cited by 53 publications

References 27 publications

A Distributed-Memory Randomized Structured Multifrontal Method for Sparse Direct Solutions

A Distributed-Memory Randomized Structured Multifrontal Method for Sparse Direct Solutions

Efficient Scalable Parallel Higher Order Direct MoM-SIE Method With Hierarchically Semiseparable Structures for 3-D Scattering

Exploiting Data Sparsity in Parallel Matrix Powers Computations

Contact Info

Product

Resources

About