We describe two recently proposed randomized algorithms for the construction of low-rank approximations to matrices, and demonstrate their application (inter alia) to the evaluation of the singular value decompositions of numerically low-rank matrices. Being probabilistic, the schemes described here have a finite probability of failure; in most cases, this probability is rather negligible (10 ؊17 is a typical value). In many situations, the new procedures are considerably more efficient and reliable than the classical (deterministic) ones; they also parallelize naturally. We present several numerical examples to illustrate the performance of the schemes.matrix ͉ SVD ͉ PCA L ow-rank approximation of linear operators is ubiquitous in applied mathematics, scientific computing, numerical analysis, and a number of other areas. In this note, we restrict our attention to two classical forms of such approximations, the singular value decomposition (SVD) and the interpolative decomposition (ID). The definition and properties of the SVD are widely known; we refer the reader to ref. 1 for a detailed description. The definition and properties of the ID are summarized in Subsection 1.1 below.Below, we discuss two randomized algorithms for the construction of the IDs of matrices. Algorithm I is designed to be used in situations where the adjoint A* of the m ϫ n matrix A to be decomposed can be applied to arbitrary vectors in a ''fast'' manner, and has CPU time requirements typically proportional to k⅐C A* ϩ k⅐m ϩ k 2 ⅐n, where k is the rank of the approximating matrix, and C A* is the cost of applying A* to a vector. Algorithm II is designed for arbitrary matrices, and its CPU time requirement is typically proportional to m⅐n⅐log(k) ϩ k 2 ⅐n. We also describe a scheme converting the ID of a matrix into its SVD for a cost proportional to k 2 ⅐(m ϩ n).Space constraints preclude us from reviewing the extensive literature on the subject; for a detailed survey, we refer the reader to ref. 2. Throughout this note, we denote the adjoint of a matrix A by A*, and the spectral (l 2 -operator) norm of A by ʈAʈ 2 ; as is well known, ʈAʈ 2 is the greatest singular value of A. Furthermore, we assume that our matrices have complex entries (as opposed to real); real versions of the algorithms under discussion are quite similar to the complex ones.This note has the following structure: Section 1 summarizes several known facts. Section 2 describes randomized algorithms for the low-rank approximation of matrices. Section 3 illustrates the performance of the algorithms via several numerical examples. Section 4 contains conclusions, generalizations, and possible directions for future research. Section 1: PreliminariesIn this section, we discuss two constructions from numerical analysis, to be used in the remainder of the note. Subsection 1.1: Interpolative Decompositions. In this subsection, we define interpolative decompositions (IDs) and summarize their properties.The following lemma states that, for any m ϫ n matrix A of rank k, there exist an...
A sketch of a matrix A is another matrix B which is significantly smaller than A, but still approximates it well. Finding such sketches efficiently is an important building block in modern algorithms for approximating, for example, the PCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can be processed only once and storage is severely limited.In this paper, we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives n rows of a large matrix A ∈ R n×m one after the other, in a streaming fashion. It maintains a sketch B ∈ R ℓ×m containing only ℓ ≪ n rows but still guarantees that A T A ≈ B T B. More accurately,f /ℓ . This algorithm's error decays proportionally to 1/ℓ using O(mℓ) space. In comparison, random-projection, hashing or sampling based algorithms produce convergence bounds proportional to 1/ √ ℓ. Sketch updates per row in A require amortized O(mℓ) operations and the algorithm is perfectly parallelizable. Our experiments corroborate the algorithm's scalability and improved convergence rate. The presented algorithm also stands out in that it is deterministic, simple to implement, and elementary to prove.
We introduce a randomized procedure that, given an m × n matrix A and a positive integer k, approximates A with a matrix Z of rank k. The algorithm relies on applying a structured l × m random matrix R to each column of A, where l is an integer near to, but greater than, k. The structure of R allows us to apply it to an arbitrary m × 1 vector at a cost proportional to m log(l); the resulting procedure can construct a rank-k approximation Z from the entries of A at a cost proportional to mn log(k) + l 2 (m + n). We prove several bounds on the accuracy of the algorithm; one such bound guarantees that the spectral norm A − Z of the discrepancy between A and Z is of the same order as max{m, n} times the (k + 1) st greatest singular value σ k+1 of A, with small probability of large deviations. In contrast, the classical pivoted "Q R" decomposition algorithms (such as Gram-Schmidt or Householder) require at least kmn floating-point operations in order to compute a similarly accurate rank-k approximation. In practice, the algorithm of this paper is faster than the classical algorithms, as long as k is neither very small nor very large. Furthermore, the algorithm operates reliably independently of the structure of the matrix A, can access each column of A independently and at most twice, and parallelizes naturally. The results are illustrated via several numerical examples.
The Fast Johnson-Lindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion fromachieved by naive multiplication by random dense matrices, an approach followed by several authors as a variant of the seminal result by Johnson and Lindenstrauss (JL) from the mid 80's. In this work we show how to significantly improve the running time, for any arbitrary small fixed δ. This beats the better of FJLT and JL. Our analysis uses a powerful measure concentration bound due to Talagrand applied to Rademacher series in Banach spaces (sums of vectors in Banach spaces with random signs). The set of vectors used is a real embedding of dual BCH code vectors over GF (2). We also discuss the number of random bits used and reduction to 1 space.The connection between geometry and discrete coding theory discussed here is interesting in its own right and may be useful in other algorithmic applications as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.