Block-oriented J-Jacobi methods for Hermitian matrices

Hari, Vjeran; Singer, Sanja

doi:10.1016/j.laa.2010.06.032

Cited by 18 publications

(31 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most promising way to further enhance these characteristics is to modify them to become BLAS 3 algorithms (see [12,28,31,33,34,52,53]). Such methods are referred to as block diagonalization or block Jacobi-type methods.…”

mentioning

confidence: 99%

Convergence to diagonal form of block Jacobi-type methods

Hari

2014

Numer. Math.

Self Cite

View full text Add to dashboard Cite

We provide sufficient conditions for the general sequential block Jacobitype method to converge to the diagonal form for cyclic pivot strategies which are weakly equivalent to the column-cyclic strategy. Given a block-matrix partition (A i j ) of a square matrix A, the paper analyzes the iterative process of the formwhere P (k) and Q (k) are elementary block matrices which differ from the identity matrix in four blocks, two diagonal and the two corresponding off-diagonal blocks. In our analysis of convergence a promising new tool is used, namely, the theory of block Jacobi operators. Typical applications lie in proving the global convergence of block Jacobi-type methods for solving standard and generalized eigenvalue and singular value problems.

show abstract

mentioning

confidence: 99%

Convergence to diagonal form of block Jacobi-type methods

Hari

2014

Numer. Math.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The new strategies may be seen as the generalizations of the Mantharam-Eberlein block-recursive strategy [27] to all even matrix orders. These new strategies are combined with the multilevel blocking and parallelization techniques explored in [20,21,37,36,29], to deliver the Jacobi-type (H)SVD algorithms for the graphics processing unit(s), competitive with the leading hybrid (CPU + GPU) alternatives, like MAGMA. The new algorithms are carefully designed to use a CPU primarily as a controlling unit.…”

Section: A Multi-gpu Algorithmmentioning

confidence: 99%

“…GPUs instead offer a complex memory hierarchy, with different access speeds and patterns, and both automatically and programmatically managed caches. Even more so than in the CPU world, a (less) careful hardware-adapted blocking of a GPU algorithm is the key technique by which considerable speedups are gained (or lost).After the introductory paper [29], here we present a family of the full block [21] and the block-oriented [20] one-sided Jacobi-type algorithm variants for the ordinary (SVD) and the hyperbolic singular value decomposition (HSVD) of a matrix, targeting both a single and the multiple GPUs. The blocking of our algorithm follows the levels of the GPU memory hierarchy; namely, the innermost level of blocking tries to maximize the amount of computation done inside the fastest (and smallest) memory of the registers and manual caches.…”

mentioning

confidence: 99%

“…After the introductory paper [29], here we present a family of the full block [21] and the block-oriented [20] one-sided Jacobi-type algorithm variants for the ordinary (SVD) and the hyperbolic singular value decomposition (HSVD) of a matrix, targeting both a single and the multiple GPUs. The blocking of our algorithm follows the levels of the GPU memory hierarchy; namely, the innermost level of blocking tries to maximize the amount of computation done inside the fastest (and smallest) memory of the registers and manual caches.…”

mentioning

confidence: 99%

“…The following variants are advisable: block-oriented variant (see [20]), when the communication (or memory access) overhead between the tasks is negligible compared to the computational costs, and full block variant (see [21]), otherwise.…”

mentioning

confidence: 99%

See 2 more Smart Citations

A Hierarchically Blocked Jacobi SVD Algorithm for Single and Multiple Graphics Processing Units

Novaković¹

2015

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

Abstract. We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card. Key words. Jacobi (H)SVD, parallel pivot strategies, graphics processing units AMS subject classifications. 65Y05, 65Y10, 65F151. Introduction. Graphics processing units have become a widely accepted tool of parallel scientific computing, but many of the established algorithms still need to be redesigned with massive parallelism in mind. Instead of multiple CPU cores, which are fully capable of simultaneously processing different operations, GPUs are essentially limited to many concurrent instructions of the same kind-a paradigm known as SIMT (single-instruction, multiple-threads) parallelism.SIMT type of parallelism is not the only reason for the redesign. Modern CPU algorithms rely on (mostly automatic) multi-level cache management for speedup. GPUs instead offer a complex memory hierarchy, with different access speeds and patterns, and both automatically and programmatically managed caches. Even more so than in the CPU world, a (less) careful hardware-adapted blocking of a GPU algorithm is the key technique by which considerable speedups are gained (or lost).After the introductory paper [29], here we present a family of the full block [21] and the block-oriented [20] one-sided Jacobi-type algorithm variants for the ordinary (SVD) and the hyperbolic singular value decomposition (HSVD) of a matrix, targeting both a single and the multiple GPUs. The blocking of our algorithm follows the levels of the GPU memory hierarchy; namely, the innermost level of blocking tries to maximize the amount of computation done inside the fastest (and smallest) memory of the registers and manual caches. The GPU's global RAM and caches are considered by the mid-level, while inter-GPU communication and synchronization are among the issues addressed by the outermost level of blocking.At each blocking level an instance of either the block-oriented or the full block Jacobi (H)SVD is run, orthogonalizing pivot columns or block-columns by conceptually the same algorithm at the lower level. Thus, the overall structure of the algorithm is hierarchical (or recursive) in nature, and ready to fit not only the cur...

show abstract

On the global convergence of the block Jacobi method for the positive definite generalized eigenvalue problem

Hari

2021

Calcolo

View full text Add to dashboard Cite

Block-oriented J-Jacobi methods for Hermitian matrices

Cited by 18 publications

References 27 publications

Convergence to diagonal form of block Jacobi-type methods

Convergence to diagonal form of block Jacobi-type methods

A Hierarchically Blocked Jacobi SVD Algorithm for Single and Multiple Graphics Processing Units

On the global convergence of the block Jacobi method for the positive definite generalized eigenvalue problem

Contact Info

Product

Resources

About