Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be "compressed" to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size that, combined with off-theshelf compression algorithms, leads to state-of-the-art generalization guarantees. In particular, we provide the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem. Additionally, we show that compressibility of models that tend to overfit is limited. Empirical results show that an increase in overfitting increases the number of bits required to describe a trained network.
We consider random processes whose distribution satisfies a symmetry property. Examples of such properties include exchangeability, stationarity, and various others. We show that, under a suitable mixing condition, estimates computed as ergodic averages of such processes satisfy a central limit theorem, a Berry-Esseen bound, and a concentration inequality. These are generalized further to triangular arrays, to a class of generalized U-statistics, and to a form of random censoring. As applications, we obtain new results on exchangeability, and on estimation in random fields and certain network models; extend results on graphon models to stochastic block models with a growing number of classes; give a simpler proof of a recent central limit theorem for marked point processes; and establish asymptotic normality of the empirical entropy of a large class of processes. In certain special cases, we recover well-known properties, which can hence be interpreted as a direct consequence of symmetry. The proofs adapt Stein's method.
Let KpX1, . . . , Xnq and HpXn|Xn´1, . . . , X1q denote the Kolmogorov complexity and Shannon's entropy rate of a stationary and ergodic process tXiu 8 i"´8 . It has been proved that KpX1, . . . , Xnq n´H pXn|Xn´1, . . . , X1q Ñ 0, almost surely. This paper studies the convergence rate of this asymptotic result. In particular, we show that if the process satisfies certain mixing conditions, then there exists σ ă 8 such that ? nˆK pX1:nq n´H pX0|X1, . . . , X´8q˙Ñ d N p0, σ 2 q.Furthermore, we show that under slightly stronger mixing conditions one may obtain nonasymptotic concentration bounds for the Kolmogorov complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.