The binary (or Ising) perceptron is a toy model of a single-layer neural network and can be viewed as a random constraint satisfaction problem with a high degree of connectivity. The model and its symmetric variant, the symmetric binary perceptron (SBP), have been studied widely in statistical physics, mathematics, and machine learning.The SBP exhibits a dramatic statistical-to-computational gap: the densities at which known efficient algorithms find solutions are far below the threshold for the existence of solutions. Furthermore, the SBP exhibits a striking structural property: at all positive constraint densities almost all of its solutions are 'totally frozen' singletons separated by large Hamming distance [PX21,ALS21b]. This suggests that finding a solution to the SBP may be computationally intractable. At the same time, however, the SBP does admit polynomial-time search algorithms at low enough densities. A conjectural explanation for this conundrum was put forth in [BDVLZ20]: efficient algorithms succeed in the face of freezing by finding exponentially rare clusters of large size. However, it was discovered recently that such rare large clusters exist at all subcritical densities, even at those well above the limits of known efficient algorithms [ALS21a]. Thus the driver of the statisticalto-computational gap exhibited by this model remains a mystery.In this paper, we conduct a different landscape analysis to explain the algorithmic tractability of this problem. We show that at high enough densities the SBP exhibits the multi Overlap Gap Property (m−OGP), an intricate geometrical property known to be a rigorous barrier for large classes of algorithms. Our analysis shows that the m−OGP threshold (a) is well below the satisfiability threshold; and (b) matches the best known algorithmic threshold up to logarithmic factors as m → ∞. We then prove that the m−OGP rules out the class of stable algorithms for the SBP above this threshold. We conjecture that the m → ∞ limit of the m-OGP threshold marks the algorithmic threshold for the problem. Furthermore, we investigate the stability of known efficient algorithms for perceptron models and show that the Kim-Roche algorithm [KR98], devised for the asymmetric binary perceptron, is stable in the sense we consider.
We focus on the high-dimensional linear regression (HDR) problem, where the algorithmic goal is to efficiently recover an unknown feature vector β * ∈ R p from its linear measurements, using a small number n of measurements. Unlike most of the literature on this model, we make no sparsity assumption on β * , but instead adopt a different regularization:(a) In the noiseless setting, we assume β * consists of entries, which are either rational numbers with a common denominator Q ∈ Z + (referred to as Q-rationality) or irrational numbers supported on a rationally independent set of bounded cardinality, known to learner; collectively called the mixed-support assumption. Using a novel combination of PSLQ integer relation and LLL lattice basis reduction algorithms, we propose an polynomial-time algorithm exactly recovering a β * ∈ R p enjoying mixedsupport assumption, from its linear measurements Y = Xβ * ∈ R n for a large class of distributions for the random entries of X, even with one measurement (n = 1). We then apply these ideas and develop a polynomial-time, single-sample algorithm for the phase retrieval problem, where a β * ∈ R p is to be recovered from magnitude-only measurements Y = | X, β * |.
We consider the problem of learning shallow noiseless neural networks with quadratic activation function and planted weights W * ∈ R m×d , where m is the width of the hidden layer and d m is the dimension of the data which consists of centered i.i.d. coordinates with second moment µ 2 and fourth moment µ 4 . We provide an analytical formula for the population risk L(W ) of any W ∈ R m×d in terms of µ 2 , µ 4 , and the distance of W from W * . We establish that the landscape of the population risk L(W ) admits an energy barrier separating rank-deficient solutions: if W ∈ R m×d with rank(W ) < d, then L(W ) is bounded away from zero by an amount we quantify. We then establish that all fullrank stationary points of L(•) are necessarily global optimum. These two results propose a simple explanation for the success of the gradient descent in training such networks: when properly initialized, gradient descent algorithm finds global optimum due to absence of spurious stationary points within the set of full-rank matrices.We then show that if the planted weight matrix W * ∈ R m×d has centered i.i.d. entries with unit variance and finite fourth moment (while the data still has centered i.i.d. coordinates as above), and is sufficiently wide, that is m > Cd 2 for large enough C, then it is easy to construct a full rank matrix W with population risk below the aforementioned energy barrier, starting from which gradient descent is guaranteed to converge to a global optimum.Our final focus is on sample complexity: we identify a simple necessary and sufficient geometric condition on the training data under which any minimizer of the empirical loss has necessarily zero generalization error. We show that as soon as n n * = d(d + 1)/2, randomly generated data enjoys this geometric condition almost surely. At the same time we show that if n < n * , then when the data has centered i.i.d. coordinates, there always exists a matrix W with empirical risk equal to zero, but with population risk bounded away from zero by the same amount as rank deficient matrices.Our results on sample complexity further shed light on an interesting phenomenon observed empirically about neural networks: we show that overparametrization does not hurt generalization, once the data is interpolated, for the networks with quadratic activations.
We study the average-case hardness of the algorithmic problem of exact computation of the partition function associated with the Sherrington-Kirkpatrick model of spin glasses with Gaussian couplings and random external field. We establish that unless P = #P , there does not exist a polynomial time algorithm to exactly compute the partition function on average, by showing that if there exists a polynomial time algorithm, which exactly computes the partition function for inverse polynomial fraction (1/n O( 1) ) of all inputs, then there is a polynomial time algorithm, which exactly computes the partition function for all inputs, with high probability, yielding P = #P .The ingredients of our proof include the random and downward self-reducibility of the partition function with random external field; an argument of Cai et al. [CPS99] for establishing the averagecase hardness of computing the permanent of a matrix; a list-decoding algorithm of Sudan [Sud96], for reconstructing polynomials intersecting a given list of numbers at sufficiently many points; and near-uniformity of the log-normal distribution, modulo a large prime p.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.