The maximal-volume concept in approximation by low-rank matrices

Goreinov, Sergei A.; Тыртышников, Е. Е.

doi:10.1090/conm/280/4620

Cited by 164 publications

(152 citation statements)

References 0 publications

Supporting

Mentioning

148

Contrasting

Unclassified

Order By: Relevance

“…Similarly, Goreinov, Tyrtyshnikov, and Zamarashkin developed a CUR matrix decomposition (a pseudoskeleton approximation) and related the choice of columns and rows to a "maximum uncorrelatedness" concept (9,10).…”

Section: Prior Cur Matrix Decompositionsmentioning

confidence: 99%

CUR matrix decompositions for improved data analysis

Mahoney

Drineas

2009

Proc. Natl. Acad. Sci. U.S.A.

638

558

View full text Add to dashboard Cite

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high "statistical leverage" and, thus, in a very precise statistical sense, exert a disproportionately large "influence" on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.randomized algorithms | singular value decomposition | principal components analysis | interpretation | statistical leverage M odern datasets are often represented by large matrices since an m × n real-valued matrix A provides a natural structure for encoding information about m objects, each of which is described by n features. Examples of such objects include documents, genomes, stocks, hyperspectral images, and web groups. Examples of the corresponding features are terms, environmental conditions, temporal resolution, frequency resolution, and individual web users. In many cases, an important step in the analysis of such data is to construct a compressed representation of A that may be easier to analyze and interpret in light of a corpus of field-specific knowledge. The most common such representation is obtained by truncating the Singular Value Decomposition (SVD) at some number k min{m, n} terms. For example, Principal Components Analysis (PCA) is just this procedure applied to a suitably normalized data correlation matrix.Recall the SVD of a general matrix A ∈ R m×n . Given A,∈ R m and {v t } n t=1 ∈ R n are such that The SVD is widely used in data analysis, often via methods such as PCA, in large part because the subspaces spanned by the vectors (typically obtained after truncating the SVD to some small number k of terms) provide the best rank-k approximation to the data matrix A. If k ≤ r = rank(A) and we define A k = k t=1 σ t u t v t T , then A − A ...

show abstract

Section: Prior Cur Matrix Decompositionsmentioning

confidence: 99%

CUR matrix decompositions for improved data analysis

Mahoney

Drineas

2009

Proc. Natl. Acad. Sci. U.S.A.

638

558

View full text Add to dashboard Cite

show abstract

“…Several general observations about the NLA approach include: 32 Within the NLA community, Stewart developed the quasi-Gram-Schmidt method and applied it to a matrix and its transpose to obtain such a CUR matrix decomposition [140,141]; and Goreinov, Tyrtyshnikov, and Zamarashkin developed a CUR matrix decomposition (a so-called pseudoskeleton approximation) and related the choice of columns and rows to a "maximum uncorrelatedness" concept [142,143]. Note that the Nyström method is a type of CUR decomposition and that the pseudoskeleton approximation is also a generalization of the Nyström method.…”

Section: A Formalization Of and Prior Approaches To This Problemmentioning

confidence: 99%

Randomized Algorithms for Matrices and Data

Mahoney¹

2012

Chapman &Amp; Hall/CRC Data Mining and Knowledge Discovery Series

335

373

View full text Add to dashboard Cite

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and this work was performed by individuals from many different research communities. While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst-case asymptotic theory and/or numerical implementation, there are numerous other benefits that are at least as important. For example, the use of randomization can lead to simpler algorithms that are easier to analyze or reason about when applied in counterintuitive settings; it can lead to algorithms with more interpretable output, which is of interest in applications where analyst time rather than just computational time is of interest; it can lead implicitly to regularization and more robust output; and randomized algorithms can often be organized to exploit modern computational architectures better than classical numerical methods.This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. Throughout this review, an emphasis will be placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these tools in large-scale data applications. Crucial in this context is the connection with the concept of statistical leverage. This concept has long been used in statistical regression diagnostics to identify outliers; and it has recently proved crucial in the development of improved worst-case matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists. This connection arises naturally when one explicitly decouples the effect of randomization in these matrix algorithms from the underlying linear algebraic structure. This decoupling also permits much finer control in the application of randomization, as well as the easier exploitation of domain knowledge.Most of the review will focus on random sampling algorithms and random projection algorithms for versions of the linear least-squares problem and the low-rank matrix approximation problem. These two problems are fundamental in theory and ubiquitous in practice. Randomized methods solve these problems by constructing and operating on a randomized sketch of the input matrix Afor random sampling methods, the sketch consists of a small number of carefully-sampled and rescaled columns/rows of A, while for random projection methods, the sketch consists of a small number of linear combinations of the columns/rows of A. Depending on the specifics of the situation, when compared with the best previously-existing deterministic algorithms, the resulting randomized algorithms have worst-case running time that is asympt...

show abstract

“…It was shown that an appropriate way of finding a good subset for the CSSP consists in selecting a number of columns such that their volume is maximal. This criterion is also referred to as maximum volume (Max-Vol) criterion [6,13,14]. Consequently, a good subset is one that maximizes the volume of the parallelepiped, i.e.…”

Section: Deterministic Selection (Dcur)mentioning

confidence: 99%

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Thurau¹,

Kersting

Bauckhage³

2012

Proceedings of the 2012 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Low-rank approximations which are computed from selected rows and columns of a given data matrix have attracted considerable attention lately. They have been proposed as an alternative to the SVD because they naturally lead to interpretable decompositions which was shown to be successful in application such as fraud detection, fMRI segmentation, and collaborative filtering. The CUR decomposition of large matrices, for example, samples rows and columns according to a probability distribution that depends on the Euclidean norm of rows or columns or on other measures of statistical leverage. At the same time, there are various deterministic approaches that do not resort to sampling and were found to often yield factorization of superior quality with respect to reconstruction accuracy. However, these are hardly applicable to large matrices as they typically suffer from high computational costs. Consequently, many practitioners in the field of data mining have abandon deterministic approaches in favor of randomized ones when dealing with today's large-scale data sets. In this paper, we empirically disprove this prejudice. We do so by introducing a novel, linear-time, deterministic CUR approach that adopts the recently introduced Simplex Volume Maximization approach for column selection. The latter has already been proven to be successful for NMF-like decompositions of matrices of billions of entries. Our exhaustive empirical study on more than 30 synthetic and real-world data sets demonstrates that it is also beneficial for CUR-like decompositions. Compared to other deterministic CUR-like methods, it provides comparable reconstruction quality but operates much faster so that it easily scales to matrices of billions of elements. Compared to sampling-based methods, it provides competitive reconstruction quality while staying in the same run-time complexity class.

show abstract

The maximal-volume concept in approximation by low-rank matrices

Cited by 164 publications

References 0 publications

CUR matrix decompositions for improved data analysis

CUR matrix decompositions for improved data analysis

Randomized Algorithms for Matrices and Data

Deterministic CUR for Improved Large-Scale Data Analysis: An Empirical Study

Contact Info

Product

Resources

About