Mark Tygert scite author profile

We describe two recently proposed randomized algorithms for the construction of low-rank approximations to matrices, and demonstrate their application (inter alia) to the evaluation of the singular value decompositions of numerically low-rank matrices. Being probabilistic, the schemes described here have a finite probability of failure; in most cases, this probability is rather negligible (10 ؊17 is a typical value). In many situations, the new procedures are considerably more efficient and reliable than the classical (deterministic) ones; they also parallelize naturally. We present several numerical examples to illustrate the performance of the schemes.matrix ͉ SVD ͉ PCA L ow-rank approximation of linear operators is ubiquitous in applied mathematics, scientific computing, numerical analysis, and a number of other areas. In this note, we restrict our attention to two classical forms of such approximations, the singular value decomposition (SVD) and the interpolative decomposition (ID). The definition and properties of the SVD are widely known; we refer the reader to ref. 1 for a detailed description. The definition and properties of the ID are summarized in Subsection 1.1 below.Below, we discuss two randomized algorithms for the construction of the IDs of matrices. Algorithm I is designed to be used in situations where the adjoint A* of the m ϫ n matrix A to be decomposed can be applied to arbitrary vectors in a ''fast'' manner, and has CPU time requirements typically proportional to k⅐C A* ϩ k⅐m ϩ k 2 ⅐n, where k is the rank of the approximating matrix, and C A* is the cost of applying A* to a vector. Algorithm II is designed for arbitrary matrices, and its CPU time requirement is typically proportional to m⅐n⅐log(k) ϩ k 2 ⅐n. We also describe a scheme converting the ID of a matrix into its SVD for a cost proportional to k 2 ⅐(m ϩ n).Space constraints preclude us from reviewing the extensive literature on the subject; for a detailed survey, we refer the reader to ref. 2. Throughout this note, we denote the adjoint of a matrix A by A*, and the spectral (l 2 -operator) norm of A by ʈAʈ 2 ; as is well known, ʈAʈ 2 is the greatest singular value of A. Furthermore, we assume that our matrices have complex entries (as opposed to real); real versions of the algorithms under discussion are quite similar to the complex ones.This note has the following structure: Section 1 summarizes several known facts. Section 2 describes randomized algorithms for the low-rank approximation of matrices. Section 3 illustrates the performance of the algorithms via several numerical examples. Section 4 contains conclusions, generalizations, and possible directions for future research. Section 1: PreliminariesIn this section, we discuss two constructions from numerical analysis, to be used in the remainder of the note. Subsection 1.1: Interpolative Decompositions. In this subsection, we define interpolative decompositions (IDs) and summarize their properties.The following lemma states that, for any m ϫ n matrix A of rank k, there exist an...

show abstract

A Randomized Algorithm for Principal Component Analysis

Rokhlin¹,

Szlam²,

Tygert³

2010

SIAM J. Matrix Anal. & Appl.

372

335

View full text Add to dashboard Cite

Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples.

show abstract

A fast randomized algorithm for the approximation of matrices

Woolfe

Liberty

Rokhlin

et al. 2008

Applied and Computational Harmonic Analysis

283

273

View full text Add to dashboard Cite

We introduce a randomized procedure that, given an m × n matrix A and a positive integer k, approximates A with a matrix Z of rank k. The algorithm relies on applying a structured l × m random matrix R to each column of A, where l is an integer near to, but greater than, k. The structure of R allows us to apply it to an arbitrary m × 1 vector at a cost proportional to m log(l); the resulting procedure can construct a rank-k approximation Z from the entries of A at a cost proportional to mn log(k) + l 2 (m + n). We prove several bounds on the accuracy of the algorithm; one such bound guarantees that the spectral norm A − Z of the discrepancy between A and Z is of the same order as max{m, n} times the (k + 1) st greatest singular value σ k+1 of A, with small probability of large deviations. In contrast, the classical pivoted "Q R" decomposition algorithms (such as Gram-Schmidt or Householder) require at least kmn floating-point operations in order to compute a similarly accurate rank-k approximation. In practice, the algorithm of this paper is faster than the classical algorithms, as long as k is neither very small nor very large. Furthermore, the algorithm operates reliably independently of the structure of the matrix A, can access each column of A independently and at most twice, and parallelizes naturally. The results are illustrated via several numerical examples.

show abstract

An Algorithm for the Principal Component Analysis of Large Data Sets

Halko¹,

Martinsson²,

Shkolnisky³

et al. 2011

SIAM J. Sci. Comput.

246

218

View full text Add to dashboard Cite

Recently popularized randomized methods for principal component analysis (PCA) efficiently and reliably produce nearly optimal accuracy -even on parallel processors -unlike the classical (deterministic) alternatives. We adapt one of these randomized methods for use with data sets that are too large to be stored in random-access memory (RAM). (The traditional terminology is that our procedure works efficiently out-of-core.) We illustrate the performance of the algorithm via several numerical examples. For example, we report on the PCA of a data set stored on disk that is so large that less than a hundredth of it can fit in our computer's RAM.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mark Tygert

Randomized algorithms for the low-rank approximation of matrices

A Randomized Algorithm for Principal Component Analysis

A fast randomized algorithm for the approximation of matrices

An Algorithm for the Principal Component Analysis of Large Data Sets

Contact Info

Product

Resources

About