1998
DOI: 10.1145/285861.285865
|View full text |Cite
|
Sign up to set email alerts
|

The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers

Abstract: This article describes the design, implementation, and evaluation of a parallel algorithm for the Cholesky factorization of symmetric banded matrices. The algorithm is part of IBM's Parallel Engineering and Scientific Subroutine Library version 1.2 and is compatible with ScaLAPACK's banded solver. Analysis, as well as experiments on an IBM SP2 distributedmemory parallel computer, shows that the algorithm efficiently factors banded matrices with wide bandwidth. For example, a 31-node SP2 factors a large matrix … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2002
2002
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 8 publications
0
8
0
Order By: Relevance
“…That is, the algorithms are asymptotically work-efficient only with smaller numbers of processors, P = O(nm/r 2 ) for the triangular solver and P = O(n 2 /(r 2 log n)). This behavior is common to linear-algebra algorithms with long critical paths, such as triangular solvers by substitution and triangular and orthogonal factorizations (see, e.g., [9]). …”
Section: Generalizations and Implementationmentioning
confidence: 94%
“…That is, the algorithms are asymptotically work-efficient only with smaller numbers of processors, P = O(nm/r 2 ) for the triangular solver and P = O(n 2 /(r 2 log n)). This behavior is common to linear-algebra algorithms with long critical paths, such as triangular solvers by substitution and triangular and orthogonal factorizations (see, e.g., [9]). …”
Section: Generalizations and Implementationmentioning
confidence: 94%
“…For computations, where data is reused many times, this technique reduces memory traffic to slower memories in the hierarchy [Hennessy and Patterson 2007]. The cache blocking technique has been extensively applied to linear algebra applications [Dongarra et al 1990;Anderson et al 1999;Kågström et al 1998;Gupta et al 1998;Goto and van de Geijn 2008;Agarwal et al 1994a]. Since accessing data from a slower memory is expensive, an algorithm that rarely goes to slower memory performs better.…”
Section: Memory Hierarchiesmentioning
confidence: 99%
“…This allows for a reduction of computational cost related to the construction of the elimination tree. Different implementations of the multi-frontal solver algorithm also exist that target specific architectures (see, e.g., [13,14,15]). There is also a linearly-computational cost direct solver based on the use of H-matrices [27] with compressed non-diagonal blocks.…”
Section: Introductionmentioning
confidence: 99%