2016
DOI: 10.1145/2830568
|View full text |Cite
|
Sign up to set email alerts
|

Manycore Algorithms for Batch Scalar and Block Tridiagonal Solvers

Abstract: Engineering, scientific and financial applications often require the simultaneous solution of a large number of independent tridiagonal systems of equations with varying coefficients. Since the number of systems is large enough to offer considerable parallelism on many-core systems, the choice between different tridiagonal solution algorithms, such as Thomas, CR (Cyclic Reduction) or PCR (Parallel Cyclic Reduction) needs to be reexamined. This work investigates the optimal choice of tridiagonal algorithm for C… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 22 publications
(22 reference statements)
0
14
0
Order By: Relevance
“…In this article we investigate the state-of-the-art in multicore/many-core algorithms for tridiagonal solvers for distributed-memory systems and re-examine the algorithmic trade-offs required at increasing machine scale to achieve good performance. The insights lead to the development of a new, highly scalable implementation extending the singlenode work of László et al [1].…”
Section: Introductionmentioning
confidence: 94%
See 1 more Smart Citation
“…In this article we investigate the state-of-the-art in multicore/many-core algorithms for tridiagonal solvers for distributed-memory systems and re-examine the algorithmic trade-offs required at increasing machine scale to achieve good performance. The insights lead to the development of a new, highly scalable implementation extending the singlenode work of László et al [1].…”
Section: Introductionmentioning
confidence: 94%
“…They offer significant opportunities for exploiting the massive parallelism available on modern multicore CPU and many-core GPU devices. With the advent of such hardware, recent work [1] reexamined the choice between different tridiagonal solution algorithms (Thomas [2], PCR [3] and Hybrid). However, many real-world problems require such algorithms to work efficiently over multiple CPU/GPU devices due to the need for compute and memory resources beyond a single node.…”
Section: Introductionmentioning
confidence: 99%
“…Since the Hines matrices are well-conditioned by definition, the computationally expensive operations like pivoting are not necessary. We can find multiple works for fast resolution on well-conditioned tridiagonal systems, such as the aforementioned work of Zhang et al [31] and the work by László et al [32]. Both works based on the use of a hybrid method to solve the tridiagonal systems, the first one is based on PCR-CR and the second one is based on PCR-Thomas.…”
Section: Related Workmentioning
confidence: 99%
“…Another single-GPU implementation for the block-tridiagonal case was done in [30], whose authors tested a variety of block and matrix sizes, showing that better performance is obtained with systems with relatively large block sizes by better utilizing the available GPU threads. Recently, a comparison of the classical solvers, including the Thomas algorithm (a specialized Gaussian elimination for tridiagonal systems), was addressed in [31], implementing them on CPU, many integrated cores (MIC) architecture and GPU accelerators for the case of using a single node.…”
Section: Implementationsmentioning
confidence: 99%