We consider the eigenvalues and eigenvectors of finite, low rank perturbations of random matrices. Specifically, we prove almost sure convergence of the extreme eigenvalues and appropriate projections of the corresponding eigenvectors of the perturbed matrix for additive and multiplicative perturbation models. The limiting non-random value is shown to depend explicitly on the limiting eigenvalue distribution of the unperturbed random matrix and the assumed perturbation model via integral transforms that correspond to very well-known objects in free probability theory that linearize non-commutative free additive and multiplicative convolution. Furthermore, we uncover a phase transition phenomenon whereby the large matrix limit of the extreme eigenvalues of the perturbed matrix differs from that of the original matrix if and only if the eigenvalues of the perturbing matrix are above a certain critical threshold. Square root decay of the eigenvalue density at the edge is sufficient to ensure that this threshold is finite. This critical threshold is intimately related to the same aforementioned integral transforms and our proof techniques bring this connection and the origin of the phase transition into focus. Consequently, our results extend the ✩ F.B.G.'s work was partially supported by the Agence Nationale de la Recherche grant ANR-08-BLAN-0311-03. R.R.N.'s research was partially supported by an Office of Naval Research postdoctoral fellowship award and grant N00014-07-1-0269. R.R.N. thanks Arthur Baggeroer for his feedback, support and encouragement. We thank Alan Edelman for feedback and encouragement and for facilitating this collaboration by hosting F.B.G.'s stay at M.I.T. We gratefully acknowledge the Singapore-MIT alliance for funding F.B.G.'s stay.
We study networks that display community structure-groups of nodes within which connections are unusually dense. Using methods from random matrix theory, we calculate the spectra of such networks in the limit of large size, and hence demonstrate the presence of a phase transition in matrix methods for community detection, such as the popular modularity maximization method. The transition separates a regime in which such methods successfully detect the community structure from one in which the structure is present but is not detected. By comparing these results with recent analyses of maximum-likelihood methods we are able to show that spectral modularity maximization is an optimal detection method in the sense that no other method will succeed in the regime where the modularity method fails.The problem of community detection in networks has attracted a substantial amount of attention in recent years [1,2]. Communities in this context are groups of vertices within a network that have a high density of within-group connections but a lower density of betweengroup connections. The challenge is to find such groups accurately and efficiently in a given network-the ability to do so would have applications in the analysis of observational data, network visualization, and complexity reduction and parallelization of network problems.In this paper we focus on matrix methods for community detection, which are based on the properties of matrix representations of networks such as the adjacency matrix or the modularity matrix. While significant effort has been devoted to the development of practical algorithms using these methods, there has been less work on formal examination of their properties and implications for algorithm performance. Here we give an analysis of the spectral properties of the adjacency and modularity matrices using random matrix methods, and in the process uncover a number of results of practical importance. Chief among these is the presence of a sharp transition between a regime in which the spectrum contains clear evidence of community structure and a regime in which it contains none. In the former regime, community detection is possible and current algorithms should perform well; in the latter, any method relying on the spectrum to perform structure detection must fail. A similar phase transition has been reported recently in an analysis of a different class of detection methods, based on Bayesian inference [3]. By comparing the two analyses, we are able to demonstrate that methods such as modularity maximization are optimal, in the sense that no other method will succeed where they fail.For the formal analysis of community structured networks, we must define the particular network or networks we will study. In this paper we focus on the most widely studied model of community structure, the stochastic block model, although our methods could be applied to other models as well. The stochastic block model, in its simplest form, divides a network of n vertices into some number q of groups denoted by r = 1 ...
The detection and estimation of signals in noisy, limited data is a problem of interest to many scientific and engineering communities. We present a mathematically justifiable, computationally simple, sample eigenvalue based procedure for estimating the number of high-dimensional signals in white noise using relatively few samples. The main motivation for considering a sample eigenvalue based scheme is the computational simplicity and the robustness to eigenvector modelling errors which are can adversely impact the performance of estimators that exploit information in the sample eigenvectors.There is, however, a price we pay by discarding the information in the sample eigenvectors; we highlight a fundamental asymptotic limit of sample eigenvalue based detection of weak/closely spaced high-dimensional signals from a limited sample size. This motivates our heuristic definition of the effective number of identifiable signals which is equal to the number of "signal" eigenvalues of the population covariance matrix which exceed the noise variance by a factor strictly greater than 1 + Dimensionality of the system Sample size . The fundamental asymptotic limit brings into sharp focus why, when there are too few samples available so that the effective number of signals is less than the actual number of signals, underestimation of the model order is unavoidable (in an asymptotic sense) when using any sample eigenvalue based detection scheme, including the one proposed herein. The analysis reveals why adding more sensors can only exacerbate the situation. Numerical simulations are used to demonstrate that the proposed estimator, like Wax and Kailath's MDL based estimator, consistently estimates the true number of signals in the dimension fixed, large sample size limit and the effective number of identifiable signals, unlike Wax and Kailath's MDL based estimator, in the large dimension, (relatively) large sample size limit.
Abstract. The truncated singular value decomposition (SVD) of the measurement matrix is the optimal solution to the representation problem of how to best approximate a noisy measurement matrix using a low-rank matrix. Here, we consider the (unobservable) denoising problem of how to best approximate a low-rank signal matrix buried in noise by optimal (re)weighting of the singular vectors of the measurement matrix. We exploit recent results from random matrix theory to exactly characterize the large matrix limit of the optimal weighting coefficients and show that they can be computed directly from data for a large class of noise models that includes the i.i.d. Gaussian noise case.Our analysis brings into sharp focus the shrinkage-and-thresholding form of the optimal weights, the non-convex nature of the associated shrinkage function (on the singular values) and explains why matrix regularization via singular value thresholding with convex penalty functions (such as the nuclear norm) will always be suboptimal. We validate our theoretical predictions with numerical simulations, develop an implementable algorithm (OptShrink) that realizes the predicted performance gains and show how our methods can be used to improve estimation in the setting where the measured matrix has missing entries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.