The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Arpit, Devansh; Bengio, Yoshua

doi:10.48550/arxiv.1901.03611

Cited by 14 publications

(22 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Theorem 4.1 is new in its own right and improves on a previous result by Giryes et al [GSB16], see also [GSB20]. It is also closely related to work of Arpit and Bengio [AB19], who investigated the capability of a random ReLU layer as in Definition 1.2, but with bias b = 0, to preserve Euclidean norms.…”

Section: Random Embeddingssupporting

confidence: 66%

The Separation Capacity of Random Neural Networks

Dirksen¹,

Genzel²,

Jacques³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article we enhance the theoretical understanding of random neural nets by addressing the following data separation problem: under what conditions can a random neural network make two classes X − , X + ⊂ R d (with positive distance) linearly separable? We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets X − , X + and their mutual arrangement. This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity (based on a localized version of Gaussian mean width), which leads to sound and informative separation guarantees. We connect our result with related lines of work on approximation, memorization, and generalization.

show abstract

Section: Random Embeddingssupporting

confidence: 66%

The Separation Capacity of Random Neural Networks

Dirksen¹,

Genzel²,

Jacques³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Altering the derivation including rectified linear units (ReLU) led to the "He" initialization (He et al, 2015). "He" initialization seems also to be favorably related to over-parameterization of networks (Arpit & Bengio, 2019). Similar to these early works, more recent work has also considered activation scale (Hanin & Rolnick, 2018) and gradients (Balduzzi et al, 2017) as well as dynamical isometry properties (Saxe et al, 2013;Yang & Schoenholz, 2017).…”

Section: Amount Of Datamentioning

confidence: 94%

Correlated Initialization for Correlated Data

Schneider

2022

Neural Process Lett

View full text Add to dashboard Cite

Spatial data exhibits the property that nearby points are correlated. This holds also for learnt representations across layers, but not for commonly used weight initialization methods. Our theoretical analysis reveals for uncorrelated initialization that (i) flow through layers suffers from much more rapid decrease and (ii) training of individual parameters is subject to more "zigzagging". We propose multiple methods for correlated initialization. For CNNs, they yield accuracy gains of several per cent in the absence of regularization. Even for properly tuned L2-regularization gains are often possible.

show abstract

“…1 (a), it is clear that overparametrization increases the number of subdivision lines and the chance to well position some of them. In addition, overparametrization has also been used favorably as a way to help gradient descent based techniques facilitating optimization (Arpit & Bengio, 2019); lastly, overparametrized also positions the initial parameters close to good local minima (Allen-Zhu et al, 2019;Zou & Gu, 2019;Kawaguchi et al, 2019) reducing the amount of updated needed during training. We formalize those points in the remark below.…”

Section: The Initialization Dilemma and The Importance Of Overparamet...mentioning

confidence: 99%

Max-Affine Spline Insights Into Deep Network Pruning

Balestriero¹,

You²,

Lü³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we study the importance of pruning in Deep Networks (DNs) and motivate it based on the current absence of data aware weight initialization. Current DN initializations, focusing primarily at maintaining first order statistics of the feature maps through depth, force practitioners to overparametrize a model in order to reach high performances. This overparametrization can then be pruned a posteriori, leading to a phenomenon known as "winning tickets". However, the pruning literature still relies on empirical investigations, lacking a theoretical understanding of (1) how pruning affects the decision boundary, (2) how to interpret pruning, (3) how to design principled pruning techniques, and (4) how to theoretically study pruning. To tackle those questions, we propose to employ recent advances in the theoretical analysis of Continuous Piecewise Affine (CPA) DNs. From this viewpoint, we can study the DNs' input space partitioning and detect the early-bird (EB) phenomenon, guide practitioners by identifying when to stop the first training step, provide interpretability into current pruning techniques, and develop a principled pruning criteria towards efficient DN training. Finally, we conduct extensive experiments to show the effectiveness of the proposed spline pruning criteria in terms of both layerwise and global pruning over state-of-theart pruning methods.

show abstract

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

Cited by 14 publications

References 9 publications

The Separation Capacity of Random Neural Networks

The Separation Capacity of Random Neural Networks

Correlated Initialization for Correlated Data

Max-Affine Spline Insights Into Deep Network Pruning

Contact Info

Product

Resources

About