Chris Mingard scite author profile

Chris Mingard

2Publications

10Citation Statements Received

46Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Neural networks are a priori biased towards Boolean functions with low entropy

Mingard¹,

Skalse²,

Valle-Pérez³

et al. 2019

Preprint

View full text Add to dashboard Cite

Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -a single-layer perceptron with n input neurons, one output neuron, and no threshold bias termwe prove that upon random initialisation of weights, the a priori probability P (t) that it represents a Boolean function that classifies t points in {0, 1} n as 1 has a remarkably simple form: P (t) = 2 −n for 0 ≤ t < 2 n . Since a perceptron can express far fewer Boolean functions with small or large values of t (low "entropy") than with intermediate values of t (high "entropy") there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed t, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.

show abstract

Feature Learning and Signal Propagation in Deep Neural Networks

Yizhang¹,

Mingard²,

Nam³

et al. 2021

Preprint

View full text Add to dashboard Cite

Modern Deep Neural Networks (DNNs) exhibit impressive generalization properties on a variety of tasks without explicit regularization, suggesting the existence of hidden regularization effects. Recent work by Baratin et al. (2021) sheds light on an intriguing implicit regularization effect, showing that some layers are much more aligned with data labels than other layers. This suggests that as the network grows in depth and width, an implicit layer selection phenomenon occurs during training. In this work, we provide the first explanation for this alignment hierarchy. We introduce and empirically validate the Equilibrium Hypothesis which states that the layers that achieve some balance between forward and backward information loss are the ones with the highest alignment to data labels. Our experiments demonstrate an excellent match with the theoretical predictions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chris Mingard

Neural networks are a priori biased towards Boolean functions with low entropy

Feature Learning and Signal Propagation in Deep Neural Networks

Contact Info

Product

Resources

About