2010
DOI: 10.1109/tpds.2009.79
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

Abstract: Abstract. The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a pre-processing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98% of the fastest parallel systems in the world were based on multicores. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0
4

Year Published

2010
2010
2018
2018

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(28 citation statements)
references
References 25 publications
0
24
0
4
Order By: Relevance
“…We may expect that these results generalize somewhat to other linear algebra algorithms and even any algorithm that can be expressed by a DAG of fine-grain tasks. Preliminary experiments using tile algorithms for two-sided transformations, i.e., the Hessenberg reduction [22] (first step for the standard eigenvalue problem) and the bidiagonal reduction [23] (first step for the singular value decomposition), show promising results.…”
Section: Discussionmentioning
confidence: 99%
“…We may expect that these results generalize somewhat to other linear algebra algorithms and even any algorithm that can be expressed by a DAG of fine-grain tasks. Preliminary experiments using tile algorithms for two-sided transformations, i.e., the Hessenberg reduction [22] (first step for the standard eigenvalue problem) and the bidiagonal reduction [23] (first step for the singular value decomposition), show promising results.…”
Section: Discussionmentioning
confidence: 99%
“…The reduction to symmetric band tridiagonal form can be easily derived for the upper case. All the operations will be then based on the LQ factorization numerical kernels, as described in Ltaief et al [18]. Most of the kernels from the first stage are compute-intensive and rely on Level 3 BLAS operations (i.e., matrix-matrix multiplication) to achieve high performance.…”
Section: High Performance Fine-grained and Memory-aware Kernelsmentioning
confidence: 99%
“…The development of high performance DLA algorithms for homogeneous multicores has been successful in some cases, like the one-sided factorizations [4], and difficult for others, like the two-sided factorizations [5]. The situation is similar for GPUs -some algorithms map well, others are more challenging.…”
Section: Hybrid Dla Algorithmsmentioning
confidence: 99%