We give algorithms to find the following simply described approximation to a given matrix. Given an m×n matrix A with entries between say −1 and 1, and an error parameter between 0 and 1, we find a matrix D (implicitly) which is the sum of O(1/ 2 ) simple rank 1 matrices so that the sum of entries of any submatrix (among the 2 m+n ) of (A − D) is at most mn in absolute value. Our algorithm takes time dependent only on and the allowed probability of failure (not on m, n).We draw on two lines of research to develop the algorithms: one is built around the fundamental Regularity Lemma of Szemerédi in Graph Theory and the constructive version of Alon, Duke, Leffman, Rödl and Yuster. The second one is from the papers of Arora, Karger and Karpinski, Fernandez de la Vega and most directly Goldwasser, Goldreich and Ron who develop approximation algorithms for a set of graph problems, typical of which is the maximum cut problem.From our matrix approximation, the above graph algorithms and the Regularity Lemma and several other results follow in a simple way.We generalize our approximations to multi-dimensional arrays and from that derive approximation algorithms for all dense Max-SNP problems.
The paper presents an algorithm for solving Integer Programming problems whose running time depends on the number n of variables in the problem as n°^n\ This is done by reducing an n variable problem to na* problems in n -i variables for some i greater than 1. The factor of n 5 / 2 "per variable" improves on the best previously known factor which is exponential in n. Minkowski's Convex Body theorem and other results from Geometry of Numbers play a crucial role in the algorithm; they are explained from first principles.
In many applications, the data consist of (or may be naturally formulated as) an m × n matrix A. It is often of interest to find a low-rank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a specified rank k, where k is much smaller than m and n. Methods such as the singular value decomposition (SVD) may be used to find an approximation to A which is the best in a well-defined sense. These methods require memory and time which are superlinear in m and n; for many applications in which the data sets are very large this is prohibitive. Two simple and intuitive algorithms are presented which, when given an m × n matrix A, compute a description of a low-rank approximation D * to A, and which are qualitatively faster than the SVD. Both algorithms have provable bounds for the error matrix A − D *. For any matrix X, let X F and X 2 denote its Frobenius norm and its spectral norm, respectively. In the first algorithm, c columns of A are randomly chosen. If the m × c matrix C consists of those c columns of A (after appropriate rescaling), then it is shown that from C T C approximations to the top singular values and corresponding singular vectors may be computed. From the computed singular vectors a description D * of the matrix A may be computed such that rank(D *) ≤ k and such that A − D * 2 ξ ≤ min D:rank(D)≤k A − D 2 ξ + poly(k, 1/c) A 2 F holds with high probability for both ξ = 2, F. This algorithm may be implemented without storing the matrix A in random access memory (RAM), provided it can make two passes over the matrix stored in external memory and use O(cm + c 2) additional RAM. The second algorithm is similar except that it further approximates the matrix C by randomly sampling r rows of C to form a r × c matrix W. Thus, it has additional error, but it can be implemented in three passes over the matrix using only constant additional RAM. To achieve an additional error (beyond the best rank k approximation) that is at most A 2 F , both algorithms take time which is polynomial in k, 1/ , and log(1/δ), where δ > 0 is a failure probability; the first takes time linear in max(m, n) and the second takes time independent of m and n. Our bounds improve previously published results with respect to the rank parameter k for both the Frobenius and spectral norms. In addition, the proofs for the error bounds use a novel method that makes important use of matrix perturbation theory. The probability distribution over columns of A and the rescaling are crucial features of the algorithms which must be chosen judiciously.
Motivated by applications in which the data may be formulated as a matrix, we consider algorithms for several common linear algebra problems. These algorithms make more efficient use of computational resources, such as the computation time, random access memory (RAM), and the number of passes over the data, than do previously known algorithms for these problems. In this paper, we devise two algorithms for the matrix multiplication problem. Suppose A and B (which are m × n and n × p, respectively) are the two input matrices. In our main algorithm, we perform c independent trials, where in each trial we randomly sample an element of {1, 2,. .. , n} with an appropriate probability distribution P on {1, 2,. .. , n}. We form an m × c matrix C consisting of the sampled columns of A, each scaled appropriately, and we form a c × n matrix R using the corresponding rows of B, again scaled appropriately. The choice of P and the column and row scaling are crucial features of the algorithm. When these are chosen judiciously, we show that CR is a good approximation to AB. More precisely, we show that AB − CR F = O(A F B F / √ c), where • F denotes the Frobenius norm, i.e., A 2 F = i,j A 2 ij. This algorithm can be implemented without storing the matrices A and B in RAM, provided it can make two passes over the matrices stored in external memory and use O(c(m + n + p)) additional RAM to construct C and R. We then present a second matrix multiplication algorithm which is similar in spirit to our main algorithm. In addition, we present a model (the pass-efficient model) in which the efficiency of these and other approximate matrix algorithms may be studied and which we argue is well suited to many applications involving massive data sets. In this model, the scarce computational resources are the number of passes over the data and the additional space and time required by the algorithm. The input matrices may be presented in any order of the entries (and not just row or column order), as is the case in many applications where, e.g., the data has been written in by multiple agents. In addition, the input matrices may be presented in a sparse representation, where only the nonzero entries are written.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.