Statistical inference problems arising within signal processing, data mining, and machine learning naturally give rise to hard combinatorial optimization problems. These problems become intractable when the dimensionality of the data is large, as is often the case for modern datasets. A popular idea is to construct convex relaxations of these combinatorial problems, which can be solved efficiently for large-scale datasets. Semidefinite programming (SDP) relaxations are among the most powerful methods in this family and are surprisingly well suited for a broad range of problems where data take the form of matrices or graphs. It has been observed several times that when the statistical noise is small enough, SDP relaxations correctly detect the underlying combinatorial structures. In this paper we develop asymptotic predictions for several detection thresholds, as well as for the estimation error above these thresholds. We study some classical SDP relaxations for statistical problems motivated by graph synchronization and community detection in networks. We map these optimization problems to statistical mechanics models with vector spins and use nonrigorous techniques from statistical mechanics to characterize the corresponding phase transitions. Our results clarify the effectiveness of SDP relaxations in solving high-dimensional statistical problems. Modern datasets pose new challenges to this centuries-old framework. On one hand, high-dimensional applications require the simultaneous estimation of millions of parameters. Examples span genomics (2), imaging (3), web services (4), and so on. On the other hand, the unknown object to be estimated has often a combinatorial structure: In clustering we aim at estimating a partition of the data points (5). Network analysis tasks usually require identification of a discrete subset of nodes in a graph (6, 7). Parsimonious data explanations are sought by imposing combinatorial sparsity constraints (8).There is an obvious tension between the above requirements. Although efficient algorithms are needed to estimate a large number of parameters, the maximum likelihood (ML) method often requires the solution of NP-hard (nondeterministic polynomial-time hard) combinatorial problems. A flourishing line of work addresses this conundrum by designing effective convex relaxations of these combinatorial problems (9-11).Unfortunately, the statistical properties of such convex relaxations are well understood only in a few cases [compressed sensing being the most important success story (12-14)]. In this paper we use tools from statistical mechanics to develop a precise picture of the behavior of a class of semidefinite programming relaxations. Relaxations of this type appear to be surprisingly effective in a variety of problems ranging from clustering to graph synchronization. For the sake of concreteness we will focus on three specific problems.
Z 2 SynchronizationIn the general synchronization problem, we aim at estimating x 0,1 , x 0,2 , . . . , x 0,n , which are unknown elements of ...