Motivation Principal Component Analysis is a key tool in the study of population structure in human genetics. As modern datasets become increasingly larger in size, traditional approaches based on loading the entire dataset in the system memory (Random Access Memory) become impractical and out-of-core implementations are the only viable alternative. Results We present TeraPCA, a C++ implementation of the Randomized Subspace Iteration method to perform Principal Component Analysis of large-scale datasets. TeraPCA can be applied both in-core and out-of-core and is able to successfully operate even on commodity hardware with a system memory of just a few gigabytes. Moreover, TeraPCA has minimal dependencies on external libraries and only requires a working installation of the BLAS and LAPACK libraries. When applied to a dataset containing a million individuals genotyped on a million markers, TeraPCA requires <5 h (in multi-threaded mode) to accurately compute the 10 leading principal components. An extensive experimental analysis shows that TeraPCA is both fast and accurate and is competitive with current state-of-the-art software for the same task. Availability and implementation Source code and documentation are both available at https://github.com/aritra90/TeraPCA. Supplementary information Supplementary data are available at Bioinformatics online.
A number of applications require the computation of the trace of a matrix that is implicitly available through a function. A common example of a function is the inverse of a large, sparse matrix, which is the focus of this paper. When the evaluation of the function is expensive, the task is computationally challenging because the standard approach is based on a Monte Carlo method which converges slowly. We present a different approach that exploits the pattern correlation, if present, between the diagonal of the inverse of the matrix and the diagonal of some approximate inverse that can be computed inexpensively. We leverage various sampling and fitting techniques to fit the diagonal of the approximation to the diagonal of the inverse.Depending on the quality of the approximate inverse, our method may serve as a standalone kernel for providing a fast trace estimate with a small number of samples. Furthermore, the method can be used as a variance reduction method for Monte Carlo in some cases. This is decided dynamically by our algorithm. An extensive set of experiments with various technique combinations on several matrices from some real applications demonstrate the potential of our method.
We introduce E R ; a versatile and efficient Latent-Factor framework for Top-N Recommendations that includes the well-known PureSVD algorithm as a special case. E R builds a low dimensional model of an inter-item proximity matrix that combines a similarity component, with a scaling operator, designed to control the influence of the prior item popularity on the final model. Seeing PureSVD within our framework provides intuition about its inner workings, exposes its inherent limitations, and also, paves the path towards painlessly improving its recommendation performance. A comprehensive set of experiments on the MovieLens and the Yahoo datasets based on widely applied performance metrics, indicate that E R outperforms several state-of-the-art algorithms, in terms of Standard and Long-Tail recommendation accuracy, exhibiting low susceptibility to sparsity, even in its most extreme manifestations -the Cold-Start problems. At the same time E R has an attractive computational profile and it can apply readily in large-scale recommendation settings.
Generalized matrix functions (GMFs) extend the concept of a matrix function to rectangular matrices via the singular value decomposition (SVD). Several applications involving directed graphs, Hamiltonian dynamical systems, and optimization problems with low-rank constraints require the action of a GMF of a large, sparse matrix on a vector. We present a new method for applying GMFs to vectors based on Chebyshev interpolation. The method is matrix-free and requires no orthogonalization and minimal additional storage. We prove that our method is backward stable and compare it with existing approaches based on Lanczos bidiagonalization. Introduction. First introduced in [21], generalized matrix functions (GMFs) extend the notion of matrix functions from square matrices to rectangular ones using the singular value decomposition (SVD). Although they are perhaps less well-known than their "standard" counterparts, GMFs arise in a variety of applications, including communication metrics for directed graphs [3,11], matrix exponentials of Hamiltonian systems [14,13], including the graph wave equation [9,26], and regularization of illposed problems [20]. For additional theory and applications of GMFs see, for instance, [1,2,27] and the references therein. In all these applications, the quantity of interest is the action of a GMF on a vector. For small matrices, one can proceed directly by computing the full SVD of the matrix. If the matrix is large and sparse, algorithms based on Lanczos bidiagonalization have been proposed in [4].In this paper, we present a new method for applying a GMF of a large, sparse * ). 1 J. L. AURENTZ, A. P. AUSTIN, M. BENZI, AND V. KALANTZIS matrix to a vector. Our method, which is based on Chebyshev interpolation, is "matrix-free"-it needs only a routine that computes the action of the matrix (and its transpose) on a vector-and uses only a small amount of additional memory, making it easy to parallelize and well-suited to large-scale problems. Similar techniques have been used to accelerate the solution of large symmetric eigenvalue problems by using multicore and GPU processors [5,6,17]. We verify the efficacy of our method with numerical experiments, which show our method to be superior in terms of memory usage and, for certain problems, compute time.We also both prove and experimentally verify that our method is backward stable. The proof we give can, with minimal modification, be used to establish backward stability for the Chebyshev interpolation methods that are so popular for computing eigenvalues and (matrix) functions of symmetric matrices. To the best of our knowledge, this is the first backward stability result for these methods to be established, and our analysis therefore fills an important gap in the literature. Functions of matrices.We begin by recalling some basic facts about both standard and generalized matrix functions. We then use these to establish the properties of GMFs that form the foundation for our algorithm.2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.