Soufiane Hayou scite author profile

Soufiane Hayou

5Publications

69Citation Statements Received

56Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

On the Selection of Initialization and Activation Function for Deep Neural Networks

Hayou¹,

Doucet²,

Rousseau³

2018

Preprint

View full text Add to dashboard Cite

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. ( 2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the 'edge of chaos' can lead to good performance. We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos. By further extending this analysis, we identify a class of activation functions that improve the information propagation over ReLU-like functions. This class includes the Swish activation, φ swish (x) = x • sigmoid(x), used in Hendrycks & Gimpel (2016), Elfwing et al. (2017) and Ramachandran et al. (2017). This provides a theoretical grounding for the excellent empirical performance of φ swish observed in these contributions. We complement those previous results by illustrating the benefit of using a random initialization on the edge of chaos in this context.

show abstract

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

Hayou¹,

Doucet²,

Rousseau³

2019

Preprint

View full text Add to dashboard Cite

On the Impact of the Activation Function on Deep Neural Networks Training

Hayou¹,

Doucet²,

Rousseau³

2019

Preprint

View full text Add to dashboard Cite

From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

Liu¹,

Yang²,

Hayou³

et al. 2022

Preprint

View full text Add to dashboard Cite

Optimization and generalization are two essential aspects of machine learning. In this paper, we propose a framework to connect optimization with generalization by analyzing the generalization error based on the length of optimization trajectory under the gradient flow algorithm after convergence. Through our approach, we show that, with a proper initialization, gradient flow converges following a short path with an explicit length estimate. Such an estimate induces a length-based generalization bound, showing that short optimization paths after convergence are associated with good generalization, which also matches our numerical results. Our framework can be applied to broad settings. For example, we use it to obtain generalization estimates on three distinct machine learning models: underdetermined p linear regression, kernel regression, and overparameterized two-layer ReLU neural networks.

show abstract

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

Hayou¹,

He²,

Dziugaite³

2021

Preprint

View full text Add to dashboard Cite

We study an approach to learning pruning masks by optimizing the expected loss of stochastic pruning masks, i.e., masks which zero out each weight independently with some weight-specific probability. We analyze the training dynamics of the induced stochastic predictor in the setting of linear regression, and observe a data-adaptive L1 regularization term, in contrast to the dataadaptive L2 regularization term known to underlie dropout in linear regression. We also observe a preference to prune weights that are less well-aligned with the data labels. We evaluate probabilistic fine-tuning for optimizing stochastic pruning masks for neural networks, starting from masks produced by several baselines (namely, magnitude pruning [1], SNIP [2], and random masks). In each case, we see improvements in test error over baselines, even after we threshold fine-tuned stochastic pruning masks. Finally, since a stochastic pruning mask induces a stochastic neural network, we consider training the weights and/or pruning probabilities simultaneously to minimize a PAC-Bayes bound on generalization error. Using data-dependent priors [3], we obtain a selfbounded learning algorithm with strong performance and numerically tight bounds. In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the "prior" and "posterior" data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Soufiane Hayou

On the Selection of Initialization and Activation Function for Deep Neural Networks

Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit

On the Impact of the Activation Function on Deep Neural Networks Training

From Optimization Dynamics to Generalization Bounds via Łojasiewicz Gradient Inequality

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

Contact Info

Product

Resources

About