We study a random graph model called the "stochastic block model" in statistics and the "planted partition model" in theoretical computer science. In its simplest form, this is a random graph with two equal-sized classes of vertices, with a within-class edge probability of q and a between-class edge probability of q ′ .A striking conjecture of Decelle, Krzkala, Moore and Zdeborová [9], based on deep, nonrigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if q = a n and q ′ = b n, s = (a − b) 2 and d = (a + b) 2 then Decelle et al. conjectured that it is possible to efficiently cluster in a way correlated with the true partition if s 2 > d and impossible if s 2 < d. By comparison, until recently the best-known rigorous result showed that clustering is possible if s 2 > Cd ln d for sufficiently large C.In a previous work, we proved that indeed it is information theoretically impossible to cluster if s 2 ≤ d and moreover that it is information theoretically impossible to even estimate the model parameters from the graph when s 2 < d. Here we prove the rest of the conjecture by providing an efficient algorithm for clustering in a way that is correlated with the true partition when s 2 > d. A different independent proof of the same result was recently obtained by Massoulié [21].
E l e c t r o n i c J o u r n a l o f P r o b a b i l i t y Electron. AbstractThe planted bisection model is a random graph model in which the nodes are divided into two equal-sized communities and then edges are added randomly in a way that depends on the community membership. We establish necessary and sufficient conditions for the asymptotic recoverability of the planted bisection in this model. When the bisection is asymptotically recoverable, we give an efficient algorithm that successfully recovers it. We also show that the planted bisection is recoverable asymptotically if and only if with high probability every node belongs to the same community as the majority of its neighbors. Our algorithm for finding the planted bisection runs in time almost linear in the number of edges. It has three stages: spectral clustering to compute an initial guess, a "replica" stage to get almost every vertex correct, and then some simple local moves to finish the job. An independent work by Abbe, Bandeira, and Hall establishes similar (slightly weaker) results but only in the case of logarithmic average degree.Consistency thresholds for the planted bisection model the smallest possible number of edges. This problem is known to be NP-complete in the worst case [14], but on a random graph model with a "planted" small bisection one might hope that it is usually easy. Indeed, Dyer and Frieze showed that if p n = p > q = q n are fixed as n → ∞ then with high probability the bisection that separates the two classes is the minimum bisection, and it can be found in expected O(n 3 ) time.These models were introduced slightly earlier in the statistics literature [12] (under the name "stochastic block model") in order to study the problem of community detection in random graphs. Here, the two parts of the bisection are interpreted as latent "communities" in a network, and the goal is to identify them from the observed graph structure. If p n > q n , the maximum a posteriori estimate of the true communities is exactly the same as the minimum bisection (see the discussion leading to Lemma 4.1), and so the community detection problem on a stochastic block model is exactly the same as the Min-Bisection problem on a planted bisection model; hence, we will use the statistical and computer science terminologies interchangeably. We note, however, the statistics literature is slightly more general, in the sense that it often allows q n > p n , and sometimes relaxes the problem by allowing the detected communities to contain some errors.Our main contribution is a necessary and sufficient condition on p n and q n for recoverability of the planted bisection. When the bisection can be recovered, we provide an efficient algorithm for doing so.
We consider the problem of reconstructing sparse symmetric block models with two blocks and connection probabilities a/n and b/n for inter-and intra-block edge probabilities, respectively. It was recently shown that one can do better than a random guess if and only if (a − b)2 > 2(a + b). Using a variant of belief propagation, we give a reconstruction algorithm that is optimal in the sense that if (a − b) 2 > C(a + b) for some constant C then our algorithm maximizes the fraction of the nodes labeled correctly. Ours is the only algorithm proven to achieve the optimal fraction of nodes labeled correctly. Along the way, we prove some results of independent interest regarding robust reconstruction for the Ising model on regular and Poisson trees.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.