Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much betterbehaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.D etecting communities or modules is a central task in the study of social, biological, and technological networks. Two of the most popular approaches are statistical inference, where we fix a generative model such as the stochastic block model to the network (1, 2); and spectral methods, where we classify vertices according to the eigenvectors of a matrix associated with the network such as its adjacency matrix or Laplacian (3).Both statistical inference and spectral methods have been shown to work well in networks that are sufficiently dense, or when the graph is regular (4-8). However, for sparse networks with widely varying degrees, the community detection problem is harder. Indeed, it was recently shown (9-11) that there is a phase transition below which communities present in the underlying block model are impossible for any algorithm to detect. Whereas standard spectral algorithms succeed down to this transition when the network is sufficiently dense, with an average degree growing as a function of network size (8), in the case where the average degree is constant these methods fail significantly above the transition (12). Thus, there is a large regime in which statistical inference succeeds in detecting communities, but where current spectral algorithms fail.It was conjectured in ref. 11 that this gap is artificial and that there exists a spectral algorithm that succeeds all of the way to the detectability transition even in the sparse case. Here, we propose an algorithm based on a linear operator considerably different from the adjacency matrix or its variants: namely, a matrix that represents a walk on the directed edges of the network, with backtracking prohibited. We give strong evidence that this algorithm indeed closes the gap.The fact that this operator has better spectral properties than, for instance, the standard random walk operator, has been used in the past in the context of random matrices and random graphs (13-15). In the theory of zeta functions of graphs, it is known as the edge adjacency operator, or the Hashimoto matrix...
The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on n nodes with two equal-sized clusters, with an between-class edge probability of q and a within-class edge probability of p. Although most of the literature on this model has focused on the case of increasing degrees (ie. pn, qn → ∞ as n → ∞), the sparse case p, q = O(1 n) is interesting both from a mathematical and an applied point of view.A striking conjecture of Decelle, Krzkala, Moore and Zdeborová based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if p = a n and q = b n, then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if (a − b) 2 > 2(a + b), and impossible if (a − b) 2 < 2(a + b). By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if (a − b) 2 > C(a + b) for some sufficiently large C.We prove half of their prediction, showing that it is indeed impossible to cluster if (a − b) 2 < 2(a + b). Furthermore we show that it is impossible even to estimate the model parameters from the graph when (a − b) 2 < 2(a + b); on the other hand, we provide a simple and efficient algorithm for estimating a and b when (a − b) 2 > 2(a + b). Following * Supported by NSF grant DMS-1106999 and DOD ONR grant N000141110140 Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.
Large graphs are sometimes studied through their degree sequences (power law or regular graphs). We study graphs that are uniformly chosen with a given degree sequence. Under mild conditions, it is shown that sequences of such graphs have graph limits in the sense of Lov\'{a}sz and Szegedy with identifiable limits. This allows simple determination of other features such as the number of triangles. The argument proceeds by studying a natural exponential model having the degree sequence as a sufficient statistic. The maximum likelihood estimate (MLE) of the parameters is shown to be unique and consistent with high probability. Thus $n$ parameters can be consistently estimated based on a sample of size one. A fast, provably convergent, algorithm for the MLE is derived. These ingredients combine to prove the graph limit theorem. Along the way, a continuous version of the Erd\H{o}s--Gallai characterization of degree sequences is derived.Comment: Published in at http://dx.doi.org/10.1214/10-AAP728 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org
Abstract. The hardcore model is a model of lattice gas systems which has received much attention in statistical physics, probability theory and theoretical computer science. It is the probability distribution over independent sets I of a graph weighted proportionally to λ |I| with fugacity parameter λ. We prove that at the uniqueness threshold of the hardcore model on the d-regular tree, approximating the partition function becomes computationally hard on graphs of maximum degree d. Specifically, we show that unless NP=RP there is no polynomial time approximation scheme for the partition function (the sum of such weighted independent sets) on graphs of maximum degreeis the uniqueness threshold on the d-regular tree and ε(d) > 0 is a positive constant. Weitz [34] produced an FPTAS for approximating the partition function when 0 < λ < λc(d) so this result demonstrates that the computational threshold exactly coincides with the statistical physics phase transition thus confirming the main conjecture of [28]. We further analyze the special case of λ = 1, d = 6 and show there is no polynomial time approximation scheme for approximately counting independent sets on graphs of maximum degree d = 6, which is optimal, improving the previous bound of d = 24. Our proof is based on specially constructed random bi-partite graphs which act as gadgets in a reduction to MAX-CUT. Building on the involved second moment method analysis of [28] and combined with an analysis of the reconstruction problem on the tree our proof establishes a strong version of "replica" method heuristics developed by theoretical physicists. The result establishes the first rigorous correspondence between the hardness of approximate counting and sampling with statistical physics phase transitions.1
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.