2018
DOI: 10.1214/17-aap1328
|View full text |Cite
|
Sign up to set email alerts
|

A random matrix approach to neural networks

Abstract: This article studies the Gram random matrix model G = 1 T Σ T Σ, Σ = σ(W X), classically found in the analysis of random feature maps and random neural networks, where X = [x1, . . . , xT ] ∈ R p×T is a (data) matrix of bounded norm, W ∈ R n×p is a matrix of independent zero-mean unit variance entries, and σ : R → R is a Lipschitz continuous (activation) function -σ(W X) being understood entrywise. By means of a key concentration of measure lemma arising from non-asymptotic random matrix arguments, we prove th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
117
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 99 publications
(125 citation statements)
references
References 36 publications
5
117
0
Order By: Relevance
“…In this section we go through the exercise of establishing the conditions in (23) for a neuron fed with independent copies of x ∼ N (0, I), x ∈ R N . Below, we go through each bound in (23), separately. In all the calculations, w 0 = 0 is a fixed vector that corresponds to the initially trained model.…”
Section: Feeding a Neuron With Iid Gaussian Samplesmentioning
confidence: 99%
See 1 more Smart Citation
“…In this section we go through the exercise of establishing the conditions in (23) for a neuron fed with independent copies of x ∼ N (0, I), x ∈ R N . Below, we go through each bound in (23), separately. In all the calculations, w 0 = 0 is a fixed vector that corresponds to the initially trained model.…”
Section: Feeding a Neuron With Iid Gaussian Samplesmentioning
confidence: 99%
“…In [2], the authors go through a chain of techniques to prove an O(s log N ) sample complexity by carefully constructing a dual certificate for the convex program. Here we will see that thanks to Theorem 4, such process is markedly reduced to establishing the conditions in (23), which is conveniently fulfilled using standard tools.…”
Section: Feeding a Neuron With Iid Gaussian Samplesmentioning
confidence: 99%
“…The choice of ELM as a learning approach is motivated by its theoretical capacity to learn any non-linear mapping, as well as its simplicity [10], [11] and amenability to theoretical analysis [15].…”
Section: B Learning Approachesmentioning
confidence: 99%
“…where we have made explicit their randomness through the dependency on the random weight matrix W. The authors of [15] show that for the regression in (6), both training and testing mean square errors almost surely converge to some deterministic limit, i.e.,…”
Section: Localization Performancementioning
confidence: 99%
“…In a recent line of works initiated in [1], in the large p and n asymptotics, kernel random matrices have been explored and have led to a completely renewed understanding of kernel approaches, starting with the asymptotic performance (and sometimes inconsistency) of kernel classification and spectral clustering. This includes kernelbased (least-square) support vector machines [2], semi-supervised classification [3] and spectral clustering [4]- [6], but also neural network derivatives such as extreme learning machines [7]. The main lever to analyze the performance of kernel matrices K ∈ R n×n in the large dimensional regime (p, n → ∞ with p/n → c0 > 0) lies in the fact that, under appropriate (what we shall call here "asymptotically non-trivial") growth rate assumptions on the data statistics, the entries Kij = f (x T i xj) or Kij = f ( xi − xj 2 ) of K tend to converge to a limiting constant, irrespective of the data class (when classification is concerned), thereby allowing for a study of K through a Taylor expansion; this gives way in particular to the possible analysis of the eigenvectors of K or to functionals of K for all large p, n. These expansions notably set forth the discriminative effect of kernel-based classification methods as they tend to emphasize (in the structure of the dominant eigenvector of K notably) the statistical difference between the class means and class covariances, this emphasis being strongly related to the derivatives of f at a certain location.…”
Section: Introductionmentioning
confidence: 99%