Proceedings of the 2020 SIAM International Conference on Data Mining 2020
DOI: 10.1137/1.9781611976236.57
|View full text |Cite
|
Sign up to set email alerts
|

Heavy-Tailed Universality Predicts Trends in Test Accuracies for Very Large Pre-Trained Deep Neural Networks

Abstract: Given two or more Deep Neural Networks (DNNs) with the same or similar architectures, and trained on the same dataset, but trained with different solvers, parameters, hyper-parameters, regularization, etc., can we predict which DNN will have the best test accuracy, and can we do so without peeking at the test data? In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this. HT-SR suggests, among other things, that modern DNNs exhibit what we call Heavy-Tailed Mech… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

3
44
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 24 publications
(47 citation statements)
references
References 42 publications
3
44
0
Order By: Relevance
“…The first two metrics are wellknown in ML. The last two metrics deserve special mention, as they depend on an empirical parameter α that is the PL exponent that arises in the recently developed Heavy Tailed Self Regularization (HT-SR) Theory [1][2][3] .…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The first two metrics are wellknown in ML. The last two metrics deserve special mention, as they depend on an empirical parameter α that is the PL exponent that arises in the recently developed Heavy Tailed Self Regularization (HT-SR) Theory [1][2][3] .…”
Section: Resultsmentioning
confidence: 99%
“…In the HT-SR Theory, one analyzes the eigenvalue spectrum, i.e., the Empirical Spectral Density (ESD), of the associated correlation matrices [1][2][3] . From this, one characterizes the amount and form of correlation, and therefore implicit self-regularizartion, present in the DNN's weight matrices.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…4 and 5) imposes intriguing characteristics on ML-generated disordered structures: "scale-free" properties on waves. Scale-free properties, which represent the power-law probabilistic distribution with heavytailed statistics, have been one of the most influential concepts in network science 2,52 , data science 50,51 , and random matrix theory 53,54 . In addition to its ubiquitous nature in biological, social, and technological systems 2 , the most important impact of scale-free property is the emergence of core nodes, also known as "hubs", which possess a very large number of links or interactions, thereby governing signal transport inside the system 2,42,52 .…”
Section: Resultsmentioning
confidence: 99%
“…Because the ML-generated lattice deformation is strongly related to the weights of the output neurons in the L2D CNN, the apparent stochastic difference between normal-random seed structures and scale-free L2D CNN outputs raises an interesting open question; the training process of deep NNs could inherently possess the scale-free property. Recently, in random matrix theory, it was demonstrated that the correlations in the weight matrices of well-trained deep NNs can be fit to a power-law with the heavy-tailed distribution 53,54 . This theory enables the successful analogy between NN structures and ML-generated realspace wave structures in our result: the identification of the "heavy-tailed perturbation distribution" of atomic sites using the "heavy-tailed weight distribution" of CNN neurons.…”
Section: Discussionmentioning
confidence: 99%