2018
DOI: 10.48550/arxiv.1806.01316
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

Abstract: The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statistics of the FIM's eigenvalues and reveal that most of them are close to zero while the maximum takes a huge value. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
24
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(26 citation statements)
references
References 25 publications
2
24
0
Order By: Relevance
“…where q (k) = W (1) x (k) / √ d and D 1(k) = diag(σ (q (k) )). Our work is then to evaluate the asymptotics of the right side of (25). Toward this end, we first show that we can replace Q(w (2) ) in the latter by the simpler matrix…”
Section: Proof Of Theorem 310mentioning
confidence: 99%
See 1 more Smart Citation
“…where q (k) = W (1) x (k) / √ d and D 1(k) = diag(σ (q (k) )). Our work is then to evaluate the asymptotics of the right side of (25). Toward this end, we first show that we can replace Q(w (2) ) in the latter by the simpler matrix…”
Section: Proof Of Theorem 310mentioning
confidence: 99%
“…The spectrum of the Fisher information matrix at initialization for one hidden layer is calculated in [40]. The Fisher matrix for deep neural network in the mean field limit is studied in [25].…”
Section: Related Literaturementioning
confidence: 99%
“…Gaussian weights and biases. In this section, we provide background information and briefly recall the formalism of Karakida et al (2018) which first computes spectral properties of the Fisher Information of a neural network and then relates it to the maximal stable learning rate.…”
Section: Preliminariesmentioning
confidence: 99%
“…This assumption holds true for a large class of losses including squared loss and cross-entropy loss. Let I θ denote the Fisher Information Matrix (FIM) associated with the parametric family induced by the loss, If θ is initialized in a sufficiently small neighborhood of θ * , then by expanding the population loss L(θ) to quadratic order about θ * one can show that a necessary condition for convergence is that the step size is bounded from above by (LeCun et al, 2012;Karakida et al, 2018)…”
Section: Fisher Information Matrix and Learning Dynamicsmentioning
confidence: 99%
See 1 more Smart Citation