We present a distributed memory algorithm for the hierarchical compression of symmetric positive definite (SPD) matrices. Our method is based on GOFMM, an algorithm that appeared in doi:10.1145/3126908.3126921. For many SPD matrices, GOFMM enables compression and approximate matrix-vector multiplication that for many matrices can reach N log N time-as opposed to N 2 required for a dense matrix. But GOFMM supports only shared memory parallelism. In this paper, we use the message passing interface (MPI) and extend the ideas of GOFMM to the distributed memory setting. We also propose and implement an asynchronous algorithm for faster multiplication. We present different usage scenarios on a selection of SPD matrices that are related to graphs, neural-networks, and covariance operators. We present results on the Texas Advanced Computing Center's "Stampede 2" system. We also compare with the STRUMPACK software package, which, to our knowledge, is the only other available software that can compress arbitrary SPD matrices in parallel. In our largest run, we were able to compress a 67M-by-67M matrix in less than three minutes and perform a multiplication with 512 vectors within 5 seconds on 6,144 Intel "Skylake" cores.
We present a distributed memory algorithm for the approximate hierarchical factorization of symmetric positive definite (SPD) matrices. Our method is based on the distributed memory GOFMM, an algorithm that appeared in SC18
We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the multilayer perceptron. The algorithm has a precomputation step and a sampling step. While it generally requires O(N n) work to compute an entry (and the entire column) in the GNH matrix for a neural network with N parameters and n data points, our fast sampling algorithm reduces the cost to O(n + d/ 2 ) work, where d is the output dimension of the network and is a prescribed accuracy (independent of N ). One application of our algorithm is constructing the hierarchicalmatrix (H-matrix) approximation of the GNH matrix for solving linear systems and eigenvalue problems. While it generally requires O(N 2 ) memory and O(N 3 ) work to store and factorize the GNH matrix, respectively. The H-matrix approximation requires only O(N ro) memory footprint and O(N r 2 o ) work to be factorized, where ro N is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the H-matrix approximation on classification and autoencoder neural networks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.