Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Zee, Field G.; Geijn, Robert A.; Quintana-Ortí, Gregorio

doi:10.1145/2535371

Cited by 14 publications

(6 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Duo uses the matrix diagonalization routines DSYEV or, optionally, DSYEVR from the LAPACK library [112]. The subroutine DSYEVR uses the multiple relatively robust representations algorithm and is expected to be faster than DSYEV, which is based on the QR algorithm [113,114]; however, the current version of DSYEVR is poorly parallelized and therefore not recommended for parallel environments.…”

Section: Computational Considerationsmentioning

confidence: 99%

Duo: A general program for calculating spectra of diatomic molecules

Yurchenko

Lodi

Tennyson

et al. 2016

Computer Physics Communications

187

221

View full text Add to dashboard Cite

Duo is a general, user-friendly program for computing rotational, rovibrational and rovibronic spectra of diatomic molecules. Duo solves the Schrödinger equation for the motion of the nuclei not only for the simple case of uncoupled, isolated electronic states (typical for the ground state of closed-shell diatomics) but also for the general case of an arbitrary number and type of couplings between electronic states (typical for open-shell diatomics and excited states). Possible couplings include spin-orbit, angular momenta, spin-rotational and spin-spin. Corrections due to non-adiabatic effects can be accounted for by introducing the relevant couplings using so-called Born-Oppenheimer breakdown curves.Duo requires user-specified potential energy curves and, if relevant, dipole moment, coupling and correction curves. From these it computes energy levels, line positions and line intensities. Several analytic forms plus interpolation and extrapolation options are available for representation of the curves. Duo can refine potential energy and coupling curves to best reproduce reference data such as experimental energy levels or line positions. Duo is provided as a Fortran 2003 program and has been tested under a variety of operating systems.

show abstract

Section: Computational Considerationsmentioning

confidence: 99%

Duo: A general program for calculating spectra of diatomic molecules

Yurchenko

Lodi

Tennyson

et al. 2016

Computer Physics Communications

187

221

View full text Add to dashboard Cite

show abstract

“…This is implemented in a Level 2 BLAS-like fashion, where an entire sequence of n Givens rotations is applied to update the entire U and V matrices (using dlasr). Recently, Van Zee, Van de Geijn, and Quintana-Ort\' {\i} [113] developed a Level 3 BLAS-like implementation of applying Givens rotations, which they found made the SVD using QR iteration competitive with the SVD using D\&C (discussed in section 7).…”

Section: Qr Iterationmentioning

confidence: 99%

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Dongarra¹,

Gates²,

Haidar³

et al. 2018

SIAM Rev.

View full text Add to dashboard Cite

The computation of the singular value decomposition, or SVD, has a long history with many improvements over the years, both in its implementations and algorithmically. Here, we survey the evolution of SVD algorithms for dense matrices, discussing the motivation and performance impacts of changes. There are two main branches of dense SVD methods: bidiagonalization and Jacobi. Bidiagonalization methods started with the implementation by Golub and Reinsch in Algol60, which was subsequently ported to Fortran in the EIS-PACK library, and was later more efficiently implemented in the LINPACK library, targeting contemporary vector machines. To address cache-based memory hierarchies, the SVD algorithm was reformulated to use Level 3 BLAS in the LAPACK library. To address new architectures, ScaLAPACK was introduced to take advantage of distributed computing, and MAGMA was developed for accelerators such as GPUs. Algorithmically, the divide and conquer and MRRR algorithms were developed to reduce the number of operations. Still, these methods remained memory bound, so two-stage algorithms were developed to reduce memory operations and increase the computational intensity, with efficient implementations in PLASMA, DPLASMA, and MAGMA. Jacobi methods started with the two-sided method of Kogbetliantz and the one-sided method of Hestenes. They have likewise had many developments, including parallel and block versions and preconditioning to improve convergence. In this paper, we investigate the impact of these changes by testing various historical and current implementations on a common, modern multicore machine and a distributed computing platform. We show that algorithmic and implementation improvements have increased the speed of the SVD by several orders of magnitude, while using up to 40 times less energy.

show abstract

“…That is, the Givens rotations may be reordered and applied to V in a blocked fashion. For examples of implementations for applying blocks of Givens rotations, see [Rajamanickam 2009;Van Zee et al 2013]. If the right block size is chosen, the bandwidth cost of the orthogonal updates can be reduced to O(n 3 / √ M ).…”

Section: Algorithmmentioning

confidence: 99%

Avoiding Communication in Successive Band Reduction

Ballard

Demmel

Knight

2015

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present sequential and parallel algorithms for tridiagonalizing a symmetric band matrix that asymptotically reduce communication compared to previous approaches.The tridiagonalization of a symmetric band matrix is a key kernel in solving the symmetric eigenvalue problem for both full and band matrices. In order to preserve sparsity, tridiagonalization routines use annihilate-and-chase procedures that previously have suffered from poor data locality. We improve data locality by reorganizing the computation and obtain asymptotic improvements. We consider the cases of computing eigenvalues only and of computing eigenvalues and all eigenvectors.

show abstract

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Cited by 14 publications

References 47 publications

Duo: A general program for calculating spectra of diatomic molecules

Duo: A general program for calculating spectra of diatomic molecules

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Avoiding Communication in Successive Band Reduction

Contact Info

Product

Resources

About