Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix FF T , F = f (W X), where W is a random weight matrix, X is a random data matrix, and f is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature networks on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.
Minimax optimization has become a central tool for modern machine learning with applications in robust optimization, game theory and training GANs. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties posed by nonconvex-nonconcave structures. We break this historical barrier by identifying three regions of nonconvex-nonconcave bilinear minimax problems and characterizing their different solution paths. For problems where the interaction between the agents is sufficiently strong, we derive global linear convergence guarantees. Conversely when the interaction between the agents is fairly weak, we derive local linear convergence guarantees. Between these two settings, we show that limiting cycles may occur, preventing the convergence of the solution path.
For a predicate f :, we call the predicate strongly approximation resistant if given a near-satisfiable instance of CSP(f ), it is computationally hard to find an assignment such that the fraction of constraints satisfied is outside theWe present a characterization of strongly approximation resistant predicates under the Unique Games Conjecture. We also present characterizations in the mixed linear and semi-definite programming hierarchy and the Sherali-Adams linear programming hierarchy. In the former case, the characterization coincides with the one based on UGC. Each of the two characterizations is in terms of existence of a probability measure on a natural convex polytope associated with the predicate. The predicate is called approximation resistant if given a near-satisfiable instance of CSP(f ), it is computationally hard to find an assignment such that the fraction of constraints satisfied is at least ρ(f ) + Ω(1). When the predicate is odd, i.e. f (−z) = 1 − f (z), ∀z ∈ {−1, 1} k , it is easily observed that the notion of approximation resistance coincides with that of strong approximation resistance. Hence for odd predicates our characterization of strong approximation resistance is also a characterization of approximation resistance.
We consider the complexity of LS+ refutations of unsatisfiable instances of Constraint Satisfaction Problems (k-CSPs) when the underlying predicate supports a pairwise independent distribution on its satisfying assignments. This is the most general condition on the predicates under which the corresponding MAX k-CSP problem is known to be approximation resistant.We show that for random instances of such k-CSPs on n variables, even after Ω(n) rounds of the LS + hierarchy, the integrality gap remains equal to the approximation ratio achieved by a random assignment. In particular, this also shows that LS + refutations for such instances require rank Ω(n). We also show the stronger result that refutations for such instances in the static LS+ proof system requires size exp(Ω(n)).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.