Proceedings of the 1996 ACM/IEEE Conference on Supercomputing 1996
DOI: 10.1145/369028.369092
|View full text |Cite
|
Sign up to set email alerts
|

Sparse LU factorization with partial pivoting on distributed memory machines

Abstract: Abstract-A sparse LU factorization based on Gaussian elimination with partial pivoting (GEPP) is important to many scientific applications, but it is still an open problem to develop a high performance GEPP code on distributed memory machines. The main difficulty is that partial pivoting operations dynamically change computation and nonzero fill-in structures during the elimination process. This paper presents an approach called S* for parallelizing this problem on distributed memory machines. The S* approach … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
25
0

Year Published

1997
1997
2004
2004

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 16 publications
(25 citation statements)
references
References 29 publications
0
25
0
Order By: Relevance
“…In the previous work, we show that static factorization does not produce too many fill-ins for most of our test matrices, even for large matrices using a simple matrix ordering strategy (minimum degree ordering) [10,11]. For a few matrices that we have tested, static factorization generates an excessive number of fill-ins.…”
mentioning
confidence: 75%
See 2 more Smart Citations
“…In the previous work, we show that static factorization does not produce too many fill-ins for most of our test matrices, even for large matrices using a simple matrix ordering strategy (minimum degree ordering) [10,11]. For a few matrices that we have tested, static factorization generates an excessive number of fill-ins.…”
mentioning
confidence: 75%
“…Most notably, the recent shared memory implementation of SuperLU has achieved up to 2.58 GFLOPS on 8 Cray C90 nodes [4,5,23]. For distributed memory machines, we proposed an approach that adopts a static symbolic factorization scheme to avoid data structure variation [10,11]. Static symbolic factorization eliminates the runtime overhead of dynamic symbolic factorization with a price of overestimated fill-ins and, thereafter, extra computation [15].…”
Section: Ams Subject Classifications 65f50 65f05mentioning
confidence: 99%
See 1 more Smart Citation
“…We examine how the computation-dominating part of the LU algorithm is e ciently implemented using the level of Perform Update2D(k m) for blocks this processor owns (10) if column block m is not factorized and all m's child supernodes have been factorized then (11) Perform Factor(m) for blocks this processor owns (12) endif (13) for j = m + 1 to N (14) if my cno = j mod pc then (15) Perform Update2D(k j) for blocks this processor owns (16) endif (17) endfor (18) endfor There could be several approaches to circumvent the above problem: One approach is to use the mixture of BLAS-1/2/3 routines. If A i k and Ai j have the same row sparse structure, and A k j and Ai j have the same column sparse structure, BLAS-3 GEMM can be directly used to modify Ai j.…”
Section: Implementation With Supernodal Gemm Kernelmentioning
confidence: 99%
“…Our previous study 8,10] shows that even with the introduction of extra nonzero elements by static symbolic factorization, the performance of the S sequential code can still be competitive to SuperLU because we are able to use more BLAS-3 operations. Table 3 and the improvement ratio in terms of MFLOPS vary from 16% to 116%, in average more than 50%.…”
Section: Overall Code Performancementioning
confidence: 99%