Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much betterbehaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.D etecting communities or modules is a central task in the study of social, biological, and technological networks. Two of the most popular approaches are statistical inference, where we fix a generative model such as the stochastic block model to the network (1, 2); and spectral methods, where we classify vertices according to the eigenvectors of a matrix associated with the network such as its adjacency matrix or Laplacian (3).Both statistical inference and spectral methods have been shown to work well in networks that are sufficiently dense, or when the graph is regular (4-8). However, for sparse networks with widely varying degrees, the community detection problem is harder. Indeed, it was recently shown (9-11) that there is a phase transition below which communities present in the underlying block model are impossible for any algorithm to detect. Whereas standard spectral algorithms succeed down to this transition when the network is sufficiently dense, with an average degree growing as a function of network size (8), in the case where the average degree is constant these methods fail significantly above the transition (12). Thus, there is a large regime in which statistical inference succeeds in detecting communities, but where current spectral algorithms fail.It was conjectured in ref. 11 that this gap is artificial and that there exists a spectral algorithm that succeeds all of the way to the detectability transition even in the sparse case. Here, we propose an algorithm based on a linear operator considerably different from the adjacency matrix or its variants: namely, a matrix that represents a walk on the directed edges of the network, with backtracking prohibited. We give strong evidence that this algorithm indeed closes the gap.The fact that this operator has better spectral properties than, for instance, the standard random walk operator, has been used in the past in the context of random matrices and random graphs (13-15). In the theory of zeta functions of graphs, it is known as the edge adjacency operator, or the Hashimoto matrix...
In this paper we study functions with low influences on product probability spaces. These are functions f W 1 n ! ޒ that have EOEVar i OEf small compared to VarOEf for each i . The analysis of boolean functions f W f 1; 1g n ! f 1; 1g with low influences has become a central problem in discrete Fourier analysis. It is motivated by fundamental questions arising from the construction of probabilistically checkable proofs in theoretical computer science and from problems in the theory of social choice in economics.We prove an invariance principle for multilinear polynomials with low influences and bounded degree; it shows that under mild conditions the distribution of such polynomials is essentially invariant for all product spaces. Ours is one of the very few known nonlinear invariance principles. It has the advantage that its proof is simple and that its error bounds are explicit. We also show that the assumption of bounded degree can be eliminated if the polynomials are slightly "smoothed"; this extension is essential for our applications to "noise stability"-type problems.In particular, as applications of the invariance principle we prove two conjectures: Khot, Kindler, Mossel, and O'Donnell's "Majority Is Stablest" conjecture from theoretical computer science, which was the original motivation for this work, and Kalai and Friedgut's "It Ain't Over Till It's Over" conjecture from social choice theory.
In this paper we show a reduction from the Unique Games problem to the problem of approximating MAX-CUT to within a factor of α GW + ∈, for all ∈ > 0; here α GW ≈ .878567 denotes the approximation ratio achieved by the Goemans-Williamson algorithm [26]. This implies that if the Unique Games Conjecture of Khot [37] holds then the Goemans-Williamson approximation algorithm is optimal. Our result indicates that the geometric nature of the Goemans-Williamson algorithm might be intrinsic to the MAX-CUT problem. Our reduction relies on a theorem we call Majority Is Stablest. This was introduced as a conjecture in the original version of this paper, and was subsequently confirmed in [45]. A stronger version of this conjecture called Plurality Is Stablest is still open, although [45] contains a proof of an asymptotic version of it. Our techniques extend to several other two-variable constraint satisfaction problems. In particular, subject to the Unique Games Conjecture, we show tight or nearly tight hardness results for MAX-2SAT, MAX-q-CUT, and MAX-2LIN(q). For MAX-2SAT we show approximation hardness up to a factor of roughly .943. This nearly matches the .940 approximation algorithm of Lewin, Livnat, and Zwick [41]. Furthermore, we show that our .943... factor is actually tight for a slightly restricted version of MAX-2SAT. For MAX-q-CUT we show a hardness factor which asymptotically (for large q) matches the approximation factor achieved by Frieze and Jerrum [25], namely 1 − 1/q + 2(ln q)/q 2. For MAX-2LIN(q) we show hardness of distinguishing between instances which are (1−∈)-satisfiable and those which are not even, roughly, (q −∈/2)-satisfiable. These parameters almost match those achieved by the recent algorithm of Charikar, Makarychev, and Makarychev [10]. The hardness result holds even for instances in which all equations are of the form x i − x j = c. At a more qualitative level, this result also implies that 1 − ∈ vs. ∈ hardness for MAX-2LIN(q) is equivalent to the Unique Games Conjecture.
The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on n nodes with two equal-sized clusters, with an between-class edge probability of q and a within-class edge probability of p. Although most of the literature on this model has focused on the case of increasing degrees (ie. pn, qn → ∞ as n → ∞), the sparse case p, q = O(1 n) is interesting both from a mathematical and an applied point of view.A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if p = a n and q = b n, then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if (a − b) 2 > 2(a + b), and impossible if (a − b) 2 < 2(a + b). By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if (a − b) 2 > C(a + b) for some sufficiently large C.We prove half of their prediction, showing that it is indeed impossible to cluster if (a − b) 2 < 2(a + b). Furthermore we show that it is impossible even to estimate the model parameters from the graph when (a − b) 2 < 2(a + b); on the other hand, we provide a simple and efficient algorithm for estimating a and b when (a − b) 2 > 2(a + b). Following * Supported by NSF grant DMS-1106999 and DOD ONR grant N000141110140 Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.
In this paper we show a reduction from the Unique Games problem to the problem of approximating MAX-CUT to within a factor of α GW + ∈, for all ∈ > 0; here α GW ≈ .878567 denotes the approximation ratio achieved by the Goemans-Williamson algorithm [26]. This implies that if the Unique Games Conjecture of Khot [37] holds then the Goemans-Williamson approximation algorithm is optimal. Our result indicates that the geometric nature of the Goemans-Williamson algorithm might be intrinsic to the MAX-CUT problem.Our reduction relies on a theorem we call Majority Is Stablest. This was introduced as a conjecture in the original version of this paper, and was subsequently confirmed in [45]. A stronger version of this conjecture called Plurality Is Stablest is still open, although [45] contains a proof of an asymptotic version of it.Our techniques extend to several other two-variable constraint satisfaction problems. In particular, subject to the Unique Games Conjecture, we show tight or nearly tight hardness results for MAX-2SAT, MAX-q-CUT, and MAX-2LIN(q).For MAX-2SAT we show approximation hardness up to a factor of roughly .943. This nearly matches the .940 approximation algorithm of Lewin, Livnat, and Zwick [41]. Furthermore, we show that our .943... factor is actually tight for a slightly restricted version of MAX-2SAT. For MAX-q-CUT we show a hardness factor which asymptotically (for large q) matches the approximation factor achieved by Frieze and Jerrum [25], namely 1 − 1/q + 2(ln q)/q 2 .For MAX-2LIN(q) we show hardness of distinguishing between instances which are (1−∈)-satisfiable and those which are not even, roughly, (q −∈/2 )-satisfiable. These parameters almost match those achieved by the recent algorithm of Charikar, Makarychev, and Makarychev [10]. The hardness result holds even for instances in which all equations are of the form x i − x j = c. At a more qualitative level, this result also implies that 1 − ∈ vs. ∈ hardness for MAX-2LIN(q) is equivalent to the Unique Games Conjecture. AbstractIn this paper we show a reduction from the Unique Games problem to the problem of approximating MAX-CUT to within a factor of α GW + , for all > 0; here α GW ≈ .878567 denotes the approximation ratio achieved by the Goemans-Williamson algorithm [26]. This implies that if the Unique Games Conjecture of Khot [37] holds then the Goemans-Williamson approximation algorithm is optimal. Our result indicates that the geometric nature of the Goemans-Williamson algorithm might be intrinsic to the MAX-CUT problem.Our reduction relies on a theorem we call Majority Is Stablest. This was introduced as a conjecture in the original version of this paper, and was subsequently confirmed in [45]. A stronger version of this conjecture called Plurality Is Stablest is still open, although [45] contains a proof of an asymptotic version of it.Our techniques extend to several other two-variable constraint satisfaction problems. In particular, subject to the Unique Games Conjecture, we show tight or nearly tight hardness results for MAX-2SAT, M...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.