Lower Bounds on the Generalization Error of Nonlinear Learning Models

Seroussi, Inbar; Zeitouni, Ofer

doi:10.48550/arxiv.2103.14723

Cited by 1 publication

(1 citation statement)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4a, 4b, 9b. In a similar vein, recent work of [60] has provided a lower bound to the generalization error of statistical estimators in terms of the rank of the Fisher (which is intimately related to the Hessian) divided by # of parameters. Practically, one could use a further relaxation of rank as nuclear norm normalized by the spectral norm, in scenarios with spurious rank inflation.…”

Section: Discussionmentioning

confidence: 97%

Analytic Insights into Structure and Rank of Neural Network Hessian Maps

Singh,

Bachmann,

Hofmann

2021

Preprint

View full text Add to dashboard Cite

The Hessian of a neural network captures parameter interactions through secondorder derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on lowrank approximations and heuristics that are blind to the network structure. In contrast, we develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency as well as the structural reasons behind it. This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks, allowing for an elegant interpretation in terms of rank deficiency. Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks. Further, we also investigate the implications of model architecture (e.g. width, depth, bias) on the rank deficiency. Overall, our work provides novel insights into the source and extent of redundancy in overparameterized networks. * Detailed list of contributions are: Sidak first discovered that the Hessian rank formula, in an early form, holds experimentally to high fidelity, thus kick-starting the project. Sidak came up with the proof technique and proved Theorem 3, Theorem 5, Theorem 9, Theorem 12. Sidak wrote essentially the entire paper and noted the rank-deficiency interpretation. Gregor proved Lemma 8, assisted in a part of Theorem 3, and empirically observed the eventual formula for the Hessian rank. Gregor essentially ran all the experiments for the final submission

show abstract

Section: Discussionmentioning

confidence: 97%