Single-Pass PCA of Large High-Dimensional Data

Yu, Wenjian; Gu, Yu; Li, Jian; Liu, Shenghua; Li, Yaohang

doi:10.24963/ijcai.2017/468

Cited by 30 publications

(31 citation statements)

References 16 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The randomized subspace iteration algorithm, which is a hybrid of Krylov and Rand methodologies, was developed based on randomized SVD [133,134]. In pass-efficient or one-pass randomized SVD, some tricks to reduce the number of passes have been considered [135,136]. TeraPCA, which is a software tool for use in population genetics studies, utilizes the Mailman algorithm to accelerate the expectationmaximization algorithms for PCA [137,138].…”

Section: Future Perspectivementioning

confidence: 99%

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Tsuyuzaki

Sato

et al. 2019

Preprint

View full text Add to dashboard Cite

Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but large-scale scRNA-seq datasets require long computational times and a large memory capacity.In this work, we review 21 fast and memory-efficient PCA implementations (10 algorithms) and evaluate their application using 4 real and 18 synthetic datasets. Our benchmarking showed that some PCA algorithms are faster, more memory efficient, and more accurate than others. In consideration of the differences in the computational environments of users and developers, we have also developed guidelines to assist with selection of appropriate PCA implementations.

show abstract

Section: Future Perspectivementioning

confidence: 99%

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Tsuyuzaki

Sato

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…A simple adjustment to the extended 1-view method makes it considerably more accurate and it can outperform the previously mentioned 1-view method when the goal is to minimize the memory required by the randomized matrix sketches. Furthermore, based on 1-view and power iteration methods proposed by [58,59], we proposed a highly pass-efficient randomized block Krylov algorithm that is suited to approximating large matrices stored out-of-core in either row-major or column-major format.…”

Section: Discussionmentioning

confidence: 99%

“…The above 1-view procedure is advantageous for large matrices that are stored out-ofcore in row-major or column-major format. Of course, when the matrix is stored in column major format and accessed column wise, then the above procedure would be applied to the matrix's transpose [58]. However, this approach does not apply to general streaming matrices, nor the Jacobian matrices motivating this study.…”

Section: Looping Over All the Rows Inmentioning

confidence: 99%

“…In [59], the above single-pass ideas were extended to the power iteration approach to obtain higher quality approximations of large matrices accessed row wise. The following section discusses generalizing a pass-efficient randomized block Krylov algorithm and combining randomized block Krylov methods with the ideas of [58,59].…”

Section: Looping Over All the Rows Inmentioning

confidence: 99%

See 1 more Smart Citation

Pass-Efficient Randomized Algorithms for Low-Rank Matrix Approximation Using Any Number of Views

Bjarkason¹

2019

SIAM J. Sci. Comput.

View full text Add to dashboard Cite

This paper describes practical randomized algorithms for low-rank matrix approximation that accommodate any budget for the number of views of the matrix. The presented algorithms, which are aimed at being as pass efficient as needed, expand and improve on popular randomized algorithms targeting efficient low-rank reconstructions. First, a more flexible subspace iteration algorithm is presented that works for any views v ≥ 2, instead of only allowing an even v. Secondly, we propose more general and more accurate singlepass algorithms. In particular, we propose a more accurate memory efficient single-pass method and a more general single-pass algorithm which, unlike previous methods, does not require prior information to assure near peak performance. Thirdly, combining ideas from subspace and single-pass algorithms, we present a more passefficient randomized block Krylov algorithm, which can achieve a desired accuracy using considerably fewer views than that needed by a subspace or previously studied block Krylov methods. However, the proposed accuracy enhanced block Krylov method is restricted to large matrices that are either accessed a few columns or rows at a time. Recommendations are also given on how to apply the subspace and block Krylov algorithms when estimating either the dominant left or right singular subspace of a matrix, or when estimating a normal matrix, such as those appearing in inverse problems. Computational experiments are carried out that demonstrate the applicability and effectiveness of the presented algorithms. Motivation.The primary motivation for this study was to develop algorithms to speed up inverse methods used to estimate parameters in models describing subsurface flow in geothermal reservoirs [36,3]. Inverting models describing complex geophysical processes, such as fluid flow in the subsurface, frequently involves matching a large data set using highly parameterized computational models. Running the model commonly involves solving an expensive and nonlinear forward problem. Despite the possible nonlinearity of the forward problem, the link between the model parameters and simulated observations is often described in terms of a Jacobian matrix J ∈ R N d ×N m , which locally linearizes the relationship between the parameters and observations. The size of J is therefore determined by the (large) parameter and observation spaces. In this case, explicitly forming J is out of the question since at best it involves solving N m direct problems (linearized forward simulations) or N d adjoint problems (linearized backward simulations) [6,23,35,38]. Nevertheless, the information contained in J can be helpful for the purpose of inverting the model using nonlinear inversion methods such as a Gauss-Newton or Levenberg-Marquardt approach, and for quantifying uncertainty.Using adjoint simulation, direct simulation and randomized algorithms, the necessary information can be extracted from J without ever explicitly forming the large matrix J . Bjarkason et al. [3] showed that inversion of a nonlinea...

show abstract

“…It is even possible to find a matrix approximation with a single pass over the data [14]. This enables the efficient computation of a low-rank approximation of dense matrices that cannot be stored completely in fast memory [30].…”

mentioning

confidence: 99%

Computing Low-Rank Approximations of Large-Scale Matrices with the Tensor Network Randomized SVD

Batselier¹,

Yu²,

Daniel³

et al. 2018

SIAM J. Matrix Anal. & Appl.

Self Cite

View full text Add to dashboard Cite

We propose a new algorithm for the computation of a singular value decomposition (SVD) low-rank approximation of a matrix in the Matrix Product Operator (MPO) format, also called the Tensor Train Matrix format. Our tensor network randomized SVD (TNrSVD) algorithm is an MPO implementation of the randomized SVD algorithm that is able to compute dominant singular values and their corresponding singular vectors. In contrast to the state-of-the-art tensorbased alternating least squares SVD (ALS-SVD) and modified alternating least squares SVD (MALS-SVD) matrix approximation methods, TNrSVD can be up to 17 times faster while achieving the same accuracy. In addition, our TNrSVD algorithm also produces accurate approximations in particular cases where both ALS-SVD and MALS-SVD fail to converge. We also propose a new algorithm for the fast conversion of a sparse matrix into its corresponding MPO form, which is up to 509 times faster than the standard Tensor Train SVD (TT-SVD) method while achieving machine precision accuracy. The efficiency and accuracy of both algorithms are demonstrated in numerical experiments.

show abstract

Single-Pass PCA of Large High-Dimensional Data

Cited by 30 publications

References 16 publications

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Pass-Efficient Randomized Algorithms for Low-Rank Matrix Approximation Using Any Number of Views

Computing Low-Rank Approximations of Large-Scale Matrices with the Tensor Network Randomized SVD

Contact Info

Product

Resources

About