Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization

Agarwal, Alekh; Anandkumar, Animashree; Jain, Prateek; Netrapalli, Praneeth

doi:10.1137/140979861

Cited by 81 publications

(173 citation statements)

References 33 publications

Supporting

Mentioning

167

Contrasting

Order By: Relevance

“…The question of when a nonnegative polynomial has such a "certificate of nonnegativity" was studied by Hilbert, who realized this doesn't always hold and asked (as his 17th problem) whether a nonnegative polynomial is always a sum of squares of rational functions. 1 The book chapter [20] is a good source for several of the known upper and lower bounds, although it does not contain some of the more recent ones. 2 While it is common in the TCS community to use Lasserre to describe the primal version of this SDP and Sum-ofSquares (SOS) to describe the dual, in this paper we use the more descriptive SOS name for both programs.…”

Section: The Sum-of-squares Hierarchymentioning

confidence: 99%

“…There are several strong lower bounds (also known as integrality gaps) for these hierarchies, in particular showing that ω(1) levels (and often even n Ω (1) or Ω(n) levels) of many such hierarchies can't improve by much on the known polynomial-time approximation guarantees for many NP hard problems, including SAT, Independent-Set, Max-Cut and more [28,27,5,21,47,52,19,14,15]. Unfortunately, there are many fewer positive results, and many of them only show that these hierarchies can match the performance of Proceedings of the 2014 ACM Symposium on Theory of Computing 31 Proceedings of the 2014 ACM Symposium on Theory of Computing previously known (and often more efficient) methods, or give algorithms that can be converted into something much more combinatorial, rather than using hierarchies to get genuinely new algorithmic results.…”

Section: Introductionmentioning

confidence: 99%

“…Unfortunately, there are many fewer positive results, and many of them only show that these hierarchies can match the performance of Proceedings of the 2014 ACM Symposium on Theory of Computing 31 Proceedings of the 2014 ACM Symposium on Theory of Computing previously known (and often more efficient) methods, or give algorithms that can be converted into something much more combinatorial, rather than using hierarchies to get genuinely new algorithmic results. 1 One of the reasons for this paucity of positive results is that we have relatively few tools to round such convex hierarchies. A rounding algorithm maps a solution to the relaxation to a solution to the original program.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Rounding sum-of-squares relaxations

Barak

Kelner

Steurer

2014

Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing

View full text Add to dashboard Cite

We present a general approach to rounding semidefinite programming relaxations obtained by the Sum-of-Squares method (Lasserre hierarchy). Our approach is based on using the connection between these relaxations and the Sum-ofSquares proof system to transform a combining algorithman algorithm that maps a distribution over solutions into a (possibly weaker) solution-into a rounding algorithm that maps a solution of the relaxation to a solution of the original problem.Using this approach, we obtain algorithms that yield improved results for natural variants of several well-known problems:1. We give a quasipolynomial-time algorithm that approximates max x 2 =1 P (x) within an additive factor of ε P spectral , where ε > 0 is a constant, P is a degree d = O(1), n-variate polynomial with nonnegative coefficients, and P spectral is the spectral norm of a matrix corresponding to P 's coefficients. Beyond being of interest in its own right, obtaining such an approximation for general polynomials (with possibly negative coefficients) is a long-standing open question in quantum information theory, and our techniques have already led to improved results in this area (Brandão and Harrow, STOC '13).2. We give a polynomial-time algorithm that, given a subspace V ⊆ R n of dimension d that (almost) contains the characteristic function of a set of size n/k, finds a vector v ∈ V that satisfies Ei v 4i Ω(d −1/3 k(Ei v 2 i ) 2 ). This is a natural analytical relaxation of the problem of finding the sparsest element in a subspace, and it is also motivated by a connection to the Small-Set Expansion problem shown by Barak et al.of the previous best known algorithms for small-set expansion in a certain range of parameters.3. We use this notion of L4 vs. L2 sparsity to obtain a polynomial-time algorithm with substantially improved guarantees for recovering a planted sparse vector v in a random d-dimensional subspace of R n . If v has µn nonzero coordinates, we can recover it with high probability whenever µ O(min(1, n/d 2 )). In particular, when d √ n, this recovers a planted vector with up to Ω(n) nonzero coordinates. When d n 2/3 , our algorithm improves upon existing methods based on comparing the L1 and L∞ norms, which intrinsically require µ O 1/ √ d . We also show how this notion of L4 vs. L2 sparsity can be used to find a planted sparse vector in a random subspace, improving on a recent result of Demanet and Hand (2013).

show abstract

Section: The Sum-of-squares Hierarchymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Rounding sum-of-squares relaxations

Barak

Kelner

Steurer

2014

Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing

View full text Add to dashboard Cite

show abstract

“…Recently, there has been a resurgence of interest in methods based on alternating minimization, as numerous authors have shown that alternating minimization (suitably initialized, and under a few technical assumptions) provably converges to the global minimum for a range of problems including matrix completion [Kes12,JNS13,Har13], robust PCA [NNS + 14], and dictionary learning [AAJN13].…”

Section: Gordon's Generalizedmentioning

confidence: 99%

Generalized Low Rank Models

Udell

Horn

Zadeh

et al. 2016

FNT in Machine Learning

205

129

View full text Add to dashboard Cite

Principal components analysis (PCA) is a well-known technique for approximating a tabular data set by a low rank matrix. Here, we extend the idea of PCA to handle arbitrary data sets consisting of numerical, Boolean, categorical, ordinal, and other data types. This framework encompasses many well known techniques in data analysis, such as nonnegative matrix factorization, matrix completion, sparse and robust PCA, k-means, k-SVD, and maximum margin matrix factorization. The method handles heterogeneous data sets, and leads to coherent schemes for compressing, denoising, and imputing missing entries across all data types simultaneously. It also admits a number of interesting interpretations of the low rank factors, which allow clustering of examples or of features. We propose several parallel algorithms for fitting generalized low rank models, and describe implementations and numerical results.

show abstract

“…We then show that, despite being a nonconvex objective, all local minima are global minima, under minimal conditions. We avoid the need for careful initialization strategies needed for previous optimality results for sparse coding [Agarwal et al, 2014;Arora et al, 2015], using recent results for more general dictionary learning settings [Haeffele and Vidal, 2015;Le and White, 2017], particularly by extending beyond smooth regularizers using Γ-convergence. Using this insight, we provide a simple alternating proximal gradient algorithm and demonstrate the utility of learning supervised sparse coding representations versus unsupervised sparse coding and a variety of tile-coding representations.…”

Section: Introductionmentioning

confidence: 99%

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

Kumaraswamy

White

2017

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

View full text Add to dashboard Cite

A variety of representation learning approaches have been investigated for reinforcement learning; much less attention, however, has been given to investigating the utility of sparse coding. Outside of reinforcement learning, sparse coding representations have been widely used, with non-convex objectives that result in discriminative representations. In this work, we develop a supervised sparse coding objective for policy evaluation. Despite the non-convexity of this objective, we prove that all local minima are global minima, making the approach amenable to simple optimization strategies. We empirically show that it is key to use a supervised objective, rather than the more straightforward unsupervised sparse coding approach. We compare the learned representations to a canonical fixed sparse representation, called tile-coding, demonstrating that the sparse coding representation outperforms a wide variety of tilecoding representations.

show abstract

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization

Cited by 81 publications

References 33 publications

Rounding sum-of-squares relaxations

Rounding sum-of-squares relaxations

Generalized Low Rank Models

Learning Sparse Representations in Reinforcement Learning with Sparse Coding

Contact Info

Product

Resources

About