The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers

Gupta, Anshul; Gustavson, Fred G.; Joshi, Manjunath V.; Toledo, Sivan

doi:10.1145/285861.285865

Cited by 13 publications

(8 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That is, the algorithms are asymptotically work-efficient only with smaller numbers of processors, P = O(nm/r 2 ) for the triangular solver and P = O(n 2 /(r 2 log n)). This behavior is common to linear-algebra algorithms with long critical paths, such as triangular solvers by substitution and triangular and orthogonal factorizations (see, e.g., [9]). …”

Section: Generalizations and Implementationmentioning

confidence: 94%

Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers

Irony

Toledo

2002

Parallel Process. Lett.

Self Cite

View full text Add to dashboard Cite

We present new communication-efficient parallel dense linear solvers: a solver for triangular linear systems with multiple right-hand sides and an LU factorization algorithm. These solvers are highly parallel and they perform a factor of 0.4P 1/6 less communication than existing algorithms, where P is number of processors. The new solvers reduce communication at the expense of using more temporary storage. Previously, algorithms that reduce communication by using more memory were only known for matrix multiplication. Our algorithms are recursive, elegant, and relatively simple to implement. We have implemented them using MPI, a message-passing libray, and tested them on a cluster of workstations.

show abstract

Section: Generalizations and Implementationmentioning

confidence: 94%

Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers

Irony

Toledo

2002

Parallel Process. Lett.

Self Cite

View full text Add to dashboard Cite

show abstract

“…For computations, where data is reused many times, this technique reduces memory traffic to slower memories in the hierarchy [Hennessy and Patterson 2007]. The cache blocking technique has been extensively applied to linear algebra applications [Dongarra et al 1990;Anderson et al 1999;Kågström et al 1998;Gupta et al 1998;Goto and van de Geijn 2008;Agarwal et al 1994a]. Since accessing data from a slower memory is expensive, an algorithm that rarely goes to slower memory performs better.…”

Section: Memory Hierarchiesmentioning

confidence: 99%

Cache-optimal algorithms for option pricing

Savage

Zubair

2010

ACM Trans. Math. Softw.

View full text Add to dashboard Cite

Today computers have several levels of memory hierarchy. To obtain good performance on these processors it is necessary to design algorithms that minimize I/O traffic to slower memories in the hierarchy. In this paper, we study the computation of option pricing using the binomial and trinomial models on processors with a multilevel memory hierarchy. We derive lower bounds on memory traffic between different levels of hierarchy for these two models. We also develop algorithms for the binomial and trinomial models that have near-optimal memory traffic between levels. We have implemented these algorithms on an UltraSparc IIIi processor with a 4-level of memory hierarchy and demonstrated that our algorithms outperform algorithms without cache blocking by a factor of up to 5 and operate at 70% of peak performance.

show abstract

“…This allows for a reduction of computational cost related to the construction of the elimination tree. Different implementations of the multi-frontal solver algorithm also exist that target specific architectures (see, e.g., [13,14,15]). There is also a linearly-computational cost direct solver based on the use of H-matrices [27] with compressed non-diagonal blocks.…”

Section: Introductionmentioning

confidence: 99%

Hypergrammar-Based Parallel Multi-Frontal Solver for Grids With Point Singularities

Gurgul¹,

Paszyński

Paszyńska³

2015

csci

View full text Add to dashboard Cite

The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers

Cited by 13 publications

References 8 publications

Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers

Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers

Cache-optimal algorithms for option pricing

Hypergrammar-Based Parallel Multi-Frontal Solver for Grids With Point Singularities

Contact Info

Product

Resources

About