Analysis of the Truncated SPIKE Algorithm

Abstract. The ScaLAPACK library contains a pair of routines for solving banded linear systems which are strictly diagonally dominant by rows. Mathematically, the algorithm is complete block cyclic reduction corresponding to a particular block partitioning of the system. In this paper we extend Heller's analysis of incomplete cyclic reduction for block tridiagonal systems to the ScaLAPACK case. We obtain a tight estimate on the significance of the off diagonal blocks of the tridiagonal linear systems generated by the cyclic reduction algorithm. Numerical experiments illustrate the advantage of omitting all but the first reduction step for a class of matrices related to high order approximations of the Laplace operator.

show abstract

“…The existence and uniqueness of the matrix K q is immediate. Moreover, Corollary 3.2 [6] implies that…”

Section: The Main Resultsmentioning

confidence: 99%

Incomplete Cyclic Reduction of Banded and Strictly Diagonally Dominant Linear Systems

Mikkelsen

Kågström

2012

Parallel Processing and Applied Mathematics

View full text Add to dashboard Cite

show abstract

“…Let T denote the main block diagonal of R, see Figure 4. Mikkelsen and Manguoglu [5] showed that T − R ∞ ≤ q when A is banded and strictly diagonally dominant by rows. In this paper we consider the significance of the off diagonal blocks relative to the main block diagonal.…”

Section: Now Consider the Solution Of A Block Tridiagonal Linear Systemmentioning

confidence: 99%

The Explicit Spike Algorithm: Iterative Solution of the Reduced System

Mikkelsen

2012

High-Performance Scientific Computing

Self Cite

View full text Add to dashboard Cite

Dedicated to Ahmed Sameh on the occasion of his 70th birthdaySummary. The explicit SPIKE algorithm applies to narrow banded linear systems which are strictly diagonally dominant by rows. The parallel bottleneck is the solution of the so-called reduced system which is block tridiagonal and strictly diagonally dominant by rows. The reduced system can be solved iteratively using the truncated reduced system matrix as a preconditioner. In this paper we derive a tight estimate for the quality of this preconditioner.

show abstract

“…For such systems, since spikes decay as one moves away from the main diagonal, a truncated variation of the algorithm is used. An extensive error analysis for the Truncated Spike algorithm is given in [29].…”

Section: The Spike Linear System Solvermentioning

confidence: 99%

“…Motivated by need for extreme scalability and the deep memory hierarchies of current platforms, we have developed the next generation of hybrid solversthe Spike family of algorithms [3,5,10,22,[29][30][31][32]. The Spike solver toolkit is specifically designed for banded systems (potentially sparse within the band).…”

Section: Introductionmentioning

confidence: 99%

Performance Models for the Spike Banded Linear System Solver

Manguoğlu

Saied

Sameh

et al. 2011

Scientific Programming

Self Cite

View full text Add to dashboard Cite

Abstract. With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners, compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model -based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters -platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.

show abstract

Analysis of the Truncated SPIKE Algorithm

Cited by 25 publications

References 13 publications

Incomplete Cyclic Reduction of Banded and Strictly Diagonally Dominant Linear Systems

Incomplete Cyclic Reduction of Banded and Strictly Diagonally Dominant Linear Systems

The Explicit Spike Algorithm: Iterative Solution of the Reduced System

Performance Models for the Spike Banded Linear System Solver

Contact Info

Product

Resources

About