A random matrix approach to neural networks

Louart, Cosme; Liao, Zhenyu; Couillet, Romain

doi:10.1214/17-aap1328

Cited by 99 publications

(125 citation statements)

References 36 publications

Supporting

Mentioning

117

Contrasting

Order By: Relevance

“…In this section we go through the exercise of establishing the conditions in (23) for a neuron fed with independent copies of x ∼ N (0, I), x ∈ R N . Below, we go through each bound in (23), separately. In all the calculations, w 0 = 0 is a fixed vector that corresponds to the initially trained model.…”

Section: Feeding a Neuron With Iid Gaussian Samplesmentioning

confidence: 99%

“…In [2], the authors go through a chain of techniques to prove an O(s log N ) sample complexity by carefully constructing a dual certificate for the convex program. Here we will see that thanks to Theorem 4, such process is markedly reduced to establishing the conditions in (23), which is conveniently fulfilled using standard tools.…”

Section: Feeding a Neuron With Iid Gaussian Samplesmentioning

confidence: 99%

See 1 more Smart Citation

Fast Convex Pruning of Deep Neural Networks

Aghasi¹,

Abdi²,

Romberg³

2020

SIAM Journal on Mathematics of Data Science

View full text Add to dashboard Cite

We develop a fast, tractable technique called Net-Trim for simplifying a trained neural network. The method is a convex post-processing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of Net-Trim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after Net-Trim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most s terms that can re-create the layer outputs from the layer inputs, we can find these weights from O(s log N/s) samples, where N is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recently-developed tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of Net-Trim for network pruning and compression. Ps log(N/s).We also show that if the x p are subgaussian, then so are the y p . As a results, the theory can be applied layer-by-layer, yielding a sampling result for networks of arbitrary depth. (When we apply the algorithm in practice, the equality constraints in (1) are relaxed; this is discussed in detail in Section 3.1.) Along with these theoretical guarantees, Net-Trim offers state-of-the-art performance on realistic networks. In Section 6, we present some numerical experiments that show that compression factors between 10x and 50x (removing 90% to 98% of the connections) are possible with very little loss in test accuracy.Contributions and relations to previous work This paper provides a full description of the Net-Trim method from both a theoretical and algorithmic perspective. In Section 3, we present our convex formulation for sparsifying the weights in the linear layers of a network; we describe how the procedure can be applied layer-by-layer in a deep network either in parallel or serially (cascading the results), and present consistency bounds for both approaches. Section 4 presents our main theoretical result, stated precisely in Theorem 4. This result derives an upper bound on the number of data samples we need to reliably discover a layer that has at most s connections in its linear layer -we show that if the data samples are random, then these weights can be learned from O(s log N/s) samples. Mathematically, this result is comparable to the sample complexity bounds for the Lasso in performing sparse regression on a linear model (also known as the compressed sensing problem). Our analysis is based on the bowling scheme [30,24]; the main technical challenges are adapting this technique to the piecewise linear...

show abstract

Section: Feeding a Neuron With Iid Gaussian Samplesmentioning

confidence: 99%

Section: Feeding a Neuron With Iid Gaussian Samplesmentioning

confidence: 99%

Fast Convex Pruning of Deep Neural Networks

Aghasi¹,

Abdi²,

Romberg³

2020

SIAM Journal on Mathematics of Data Science

View full text Add to dashboard Cite

show abstract

“…The choice of ELM as a learning approach is motivated by its theoretical capacity to learn any non-linear mapping, as well as its simplicity [10], [11] and amenability to theoretical analysis [15].…”

Section: B Learning Approachesmentioning

confidence: 99%

“…where we have made explicit their randomness through the dependency on the random weight matrix W. The authors of [15] show that for the regression in (6), both training and testing mean square errors almost surely converge to some deterministic limit, i.e.,…”

Section: Localization Performancementioning

confidence: 99%

CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach

Decurninge

Ordóñez

Ferrand

et al. 2018

2018 15th International Symposium on Wireless Communication Systems (ISWCS)

View full text Add to dashboard Cite

We report on experimental results on the use of a learning-based approach to infer the location of a mobile user of a cellular network within a cell, for a 5G-type Massive multiple input, multiple output (MIMO) system. We describe how the sample spatial covariance matrix computed from the CSI can be used as the input to a learning algorithm which attempts to relate it to user location. We discuss several learning approaches, and analyze in depth the application of extreme learning machines, for which theoretical approximate performance benchmarks are available, to the localization problem. We validate the proposed approach using experimental data collected on a Huawei 5G testbed, provide some performance and robustness benchmarks, and discuss practical issues related to the deployment of such a technique in 5G networks.

show abstract

“…In a recent line of works initiated in [1], in the large p and n asymptotics, kernel random matrices have been explored and have led to a completely renewed understanding of kernel approaches, starting with the asymptotic performance (and sometimes inconsistency) of kernel classification and spectral clustering. This includes kernelbased (least-square) support vector machines [2], semi-supervised classification [3] and spectral clustering [4]- [6], but also neural network derivatives such as extreme learning machines [7]. The main lever to analyze the performance of kernel matrices K ∈ R n×n in the large dimensional regime (p, n → ∞ with p/n → c0 > 0) lies in the fact that, under appropriate (what we shall call here "asymptotically non-trivial") growth rate assumptions on the data statistics, the entries Kij = f (x T i xj) or Kij = f ( xi − xj 2 ) of K tend to converge to a limiting constant, irrespective of the data class (when classification is concerned), thereby allowing for a study of K through a Taylor expansion; this gives way in particular to the possible analysis of the eigenvectors of K or to functionals of K for all large p, n. These expansions notably set forth the discriminative effect of kernel-based classification methods as they tend to emphasize (in the structure of the dominant eigenvector of K notably) the statistical difference between the class means and class covariances, this emphasis being strongly related to the derivatives of f at a certain location.…”

Section: Introductionmentioning

confidence: 99%

Random Matrix-Improved Kernels For Large Dimensional Spectral Clustering

Ali

Kammoun

Couillet

2018

2018 IEEE Statistical Signal Processing Workshop (SSP)

Self Cite

View full text Add to dashboard Cite

Leveraging on recent random matrix advances in the performance analysis of kernel methods for classification and clustering, this article proposes a new family of kernel functions theoretically largely outperforming standard kernels in the context of asymptotically large and numerous datasets. These kernels are designed to discriminate statistical means and covariances across data classes at a theoretically minimal rate (with respect to data size). Applied to spectral clustering, we demonstrate the validity of our theoretical findings both on synthetic and real-world datasets (here, the popular MNIST database as well as EEG recordings on epileptic patients).Index Terms-Spectral clustering, inner product kernels, random matrix theory.

show abstract

A random matrix approach to neural networks

Cited by 99 publications

References 36 publications

Fast Convex Pruning of Deep Neural Networks

Fast Convex Pruning of Deep Neural Networks

CSI-based Outdoor Localization for Massive MIMO: Experiments with a Learning Approach

Random Matrix-Improved Kernels For Large Dimensional Spectral Clustering

Contact Info

Product

Resources

About