Many modern parallel systems, such as MapReduce, Hadoop and Spark, can be modeled well by the MPC model. The MPC model captures well coarse-grained computation on large data -data is distributed to processors, each of which has a sublinear (in the input data) amount of memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. This model is stronger than the classical PRAM model, and it is an intriguing question to design algorithms whose running time is smaller than in the PRAM model.One fundamental graph problem is connectivity. On an undirected graph with n nodes and m edges, O(log n) round connectivity algorithms have been known for over 35 years. However, no algorithms with better complexity bounds were known. In this work, we give fully scalable, faster algorithms for the connectivity problem, by parameterizing the time complexity as a function of the diameter of the graph. Our main result is a O(log D log log m/n n) time connectivity algorithm for diameter-D graphs, using Θ(m) total memory. If our algorithm can use more memory, it can terminate in fewer rounds, and there is no lower bound on the memory per processor.We extend our results to related graph problems such as spanning forest, finding a DFS sequence, exact/approximate minimum spanning forest, and bottleneck spanning forest. We also show that achieving similar bounds for reachability in directed graphs would imply faster boolean matrix multiplication algorithms.We introduce several new algorithmic ideas. We describe a general technique called double exponential speed problem size reduction which roughly means that if we can use total memory N to reduce a problem from size n to n/k, for k = (N/n) Θ(1) in one phase, then we can solve the problem in O(log log N/n n) phases. In order to achieve this fast reduction for graph connectivity, we use a multistep algorithm. One key step is a carefully constructed truncated broadcasting scheme where each node broadcasts neighbor sets to its neighbors in a way that limits the size of the resulting neighbor sets. Another key step is random leader contraction, where we choose a smaller set of leaders than many previous works do.
We study the 1 -low rank approximation problem, where for a given n × d matrix A and approximation factor α ≥ 1, the goal is to output a rank-k matrix A for whichwhere for an n × d matrix C, we let C 1 = n i=1 d j=1 |C i,j |. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms.We give the first provable approximation algorithms for 1 -low rank approximation, showing that it is possible to achieve approximation factor α = (log d)·poly(k) in nnz(A)+(n+d) poly(k) time, where nnz(A) denotes the number of non-zero entries of A. If k is constant, we further improve the approximation ratio to O(1) with a poly(nd)-time algorithm. Under the Exponential Time Hypothesis, we show there is no poly(nd)-time algorithm achieving a (1 + 1 log 1+γ (nd) )approximation, for γ > 0 an arbitrarily small constant, even when k = 1.We give a number of additional results for 1 -low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to p -norms for 1 ≤ p < 2 and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation. * Work done while visiting IBM Almaden. arXiv:1611.00898v1 [cs.DS] 3 Nov 2016 1 -low rank approximation can better handle missing data, is appropriate in noise models for which 1 log 1+γ (n) )-approximation, for γ > 0 an arbitrarily small constant, even when k = 1. The latter strengthens the NP-hardness result of [GV15].We also give a number of results for variants of 1 -low rank approximation which are studied for Frobenius norm low rank approxiation; prior to our work nothing was known about these problems.Column Subset Selection and CUR Decomposition: In the column subset selection problem, one seeks a small subset C of columns of A for which there is a matrix X for which CX − A is small, under some norm. The matrix CX provides a low rank approximation to A which is often more interpretable, since it stores actual columns of A, preserves sparsity, etc. These have been extensively studied when the norm is the Frobenius or operator norm (see, e.g., [BMD09, DR10, BDM11] and the references therein). We initiate the study of this problem with respect to the 1 -norm. We first prove an existence result, namely, that there exist matrices A for which any subset C of poly(k) columns satisfies minwhere γ > 0 is an arbitrarily small constant. This result is in stark contrast to the Frobenius norm for which for every matrix there exist O( k ) columns for which the approximation factor is 1 + . We also show that our bound is nearly optimal in this regime, by showing for every matrix t...
We consider the problem of estimating a Fourier-sparse signal from noisy samples, where the sampling is done over some interval [0, T ] and the frequencies can be "off-grid". Previous methods for this problem required the gap between frequencies to be above 1/T , the threshold required to robustly identify individual frequencies. We show the frequency gap is not necessary to estimate the signal as a whole: for arbitrary k-Fourier-sparse signals under 2 bounded noise, we show how to estimate the signal with a constant factor growth of the noise and sample complexity polynomial in k and logarithmic in the bandwidth and signal-to-noise ratio.As a special case, we get an algorithm to interpolate degree d polynomials from noisy measurements, using O(d) samples and increasing the noise by a constant factor in 2 .
In this paper we provide an O(nd + d 3 ) time randomized algorithm for solving linear programs with d variables and n constraints with high probability. To obtain this result we provide a robust, primal-dual O( √ d)-iteration interior point method inspired by the methods of Lee and Sidford (2014, 2019) and show how to efficiently implement this method using new data-structures based on heavy-hitters, the Johnson-Lindenstrauss lemma, and inverse maintenance. Interestingly, we obtain this running time without using fast matrix multiplication and consequently, barring a major advance in linear system solving, our running time is near optimal for solving dense linear programs among algorithms that don't use fast matrix multiplication.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.