Gabriele Perugini scite author profile

The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. In this work we first discuss the relationship between alternative measures of flatness: the local entropy, which is useful for analysis and algorithm development, and the local energy, which is easier to compute and was shown empirically in extensive tests on state-of-the-art networks to be the best predictor of generalization capabilities. We show semi-analytically in simple controlled scenarios that these two measures correlate strongly with each other and with generalization. Then, we extend the analysis to the deep learning scenario by extensive numerical validations. We study two algorithms, entropy-stochastic gradient descent and replicated-stochastic gradient descent, that explicitly include the local entropy in the optimization objective. We devise a training schedule by which we consistently find flatter minima (using both flatness measures), and improve the generalization error for common architectures (e.g. ResNet, EfficientNet).

show abstract

Unveiling the Structure of Wide Flat Minima in Neural Networks

Baldassi

Clarissa

Malatesta

et al. 2021

Phys. Rev. Lett.

View full text Add to dashboard Cite

Improved belief propagation algorithm finds many Bethe states in the random-field Ising model on random graphs

Perugini

Ricci-Tersenghi

2018

Phys. Rev. E

View full text Add to dashboard Cite

We first present an empirical study of the Belief Propagation (BP) algorithm, when run on the random field Ising model defined on random regular graphs in the zero temperature limit. We introduce the notion of extremal solutions for the BP equations, and we use them to fix a fraction of spins in their ground state configuration. At the phase transition point the fraction of unconstrained spins percolates and their number diverges with the system size. This in turn makes the associated optimization problem highly non trivial in the critical region. Using the bounds on the BP messages provided by the extremal solutions we design a new and very easy to implement BP scheme which is able to output a large number of stable fixed points. On one hand this new algorithm is able to provide the minimum energy configuration with high probability in a competitive time. On the other hand we found that the number of fixed points of the BP algorithm grows with the system size in the critical region. This unexpected feature poses new relevant questions about the physics of this class of models.

show abstract

Learning through atypical phase transitions in overparameterized neural networks

et al. 2022

View full text Add to dashboard Cite

Deep learning via message passing algorithms based on belief propagation

Lucibello

Pittorino

Perugini

et al. 2022

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. They yield exact marginals on tree-like graphical models and have also proven to be eﬀective in many problems deﬁned on loopy graphs, from inference to optimization, from signal processing to clustering. The BP-based schemes are fundamentally diﬀerent from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement term that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with performance comparable to SGD heuristics in a diverse set of experiments on natural datasets including multi-class image classiﬁcation and continual learning, while being capable of yielding improved performances on sparse networks. Furthermore, they allow to make approximate Bayesian predictions that have higher accuracy than point-wise ones.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.