We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as log n, with n the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical k-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.
We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called rank-one-out conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called leave-one-covariate-out or LOCO inference. Accompanying this paper is an R package conformalInference that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
Persistent homology is a method for probing topological properties of point
clouds and functions. The method involves tracking the birth and death of
topological features (2000) as one varies a tuning parameter. Features with
short lifetimes are informally considered to be "topological noise," and those
with a long lifetime are considered to be "topological signal." In this paper,
we bring some statistical ideas to persistent homology. In particular, we
derive confidence sets that allow us to separate topological signal from
topological noise.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1252 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.