2014
DOI: 10.1145/2535371
|View full text |Cite
|
Sign up to set email alerts
|

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Abstract: We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they become rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular vector matrix. This algorithm is then implemented with optimizations that (1) leverage vector instruction units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(6 citation statements)
references
References 47 publications
0
6
0
Order By: Relevance
“…Duo uses the matrix diagonalization routines DSYEV or, optionally, DSYEVR from the LAPACK library [112]. The subroutine DSYEVR uses the multiple relatively robust representations algorithm and is expected to be faster than DSYEV, which is based on the QR algorithm [113,114]; however, the current version of DSYEVR is poorly parallelized and therefore not recommended for parallel environments.…”
Section: Computational Considerationsmentioning
confidence: 99%
“…Duo uses the matrix diagonalization routines DSYEV or, optionally, DSYEVR from the LAPACK library [112]. The subroutine DSYEVR uses the multiple relatively robust representations algorithm and is expected to be faster than DSYEV, which is based on the QR algorithm [113,114]; however, the current version of DSYEVR is poorly parallelized and therefore not recommended for parallel environments.…”
Section: Computational Considerationsmentioning
confidence: 99%
“…This is implemented in a Level 2 BLAS-like fashion, where an entire sequence of n Givens rotations is applied to update the entire U and V matrices (using dlasr). Recently, Van Zee, Van de Geijn, and Quintana-Ort\' {\i} [113] developed a Level 3 BLAS-like implementation of applying Givens rotations, which they found made the SVD using QR iteration competitive with the SVD using D\&C (discussed in section 7).…”
Section: Qr Iterationmentioning
confidence: 99%
“…That is, the Givens rotations may be reordered and applied to V in a blocked fashion. For examples of implementations for applying blocks of Givens rotations, see [Rajamanickam 2009;Van Zee et al 2013]. If the right block size is chosen, the bandwidth cost of the orthogonal updates can be reduced to O(n 3 / √ M ).…”
Section: Algorithmmentioning
confidence: 99%