2022
DOI: 10.1109/mcse.2021.3130544
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Many-Core Algorithms for Tridiagonal Solvers

Abstract: How to cite:Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…Finally, we compare performance on the FPGA to an Nvidia Tesla V100 GPU using the tridiagonal solver library, tridsolver implemented by László et al [2,15] using its batched version presented by Reguly et al [22]. This GPU library has been shown [7] to provide matching or better performance than the two current batch tridiagonal solver functions in Nvidia's cuS-PARSE library [6, 27] ś cusparse<t>gtsv2StridedBatch() and cusparse<t>gtsvInterleavedBatch(). Our experiments also confirmed these results for the applications evaluated in this paper.…”
Section: Performance Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, we compare performance on the FPGA to an Nvidia Tesla V100 GPU using the tridiagonal solver library, tridsolver implemented by László et al [2,15] using its batched version presented by Reguly et al [22]. This GPU library has been shown [7] to provide matching or better performance than the two current batch tridiagonal solver functions in Nvidia's cuS-PARSE library [6, 27] ś cusparse<t>gtsv2StridedBatch() and cusparse<t>gtsvInterleavedBatch(). Our experiments also confirmed these results for the applications evaluated in this paper.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…Note also that we have selected the number of interleaved systems and interleaved reduced systems to be equal (i.e. 𝑔 = 𝑔 𝑟 in relation to (7)). The final term in (15) and ( 16) are the latencies for processing a batch of B systems.…”
mentioning
confidence: 99%