Itay Golan scite author profile

Catastrophic forgetting is the notorious vulnerability of neural networks to the changes in the data distribution during learning. This phenomenon has long been considered a major obstacle for using learning agents in realistic continual learning settings. A large body of continual learning research assumes that task boundaries are known during training. However, only a few works consider scenarios in which task boundaries are unknown or not well defined: task-agnostic scenarios. The optimal Bayesian solution for this requires an intractable online Bayes update to the weights posterior. We aim to approximate the online Bayes update as accurately as possible. To do so, we derive novel fixed-point equations for the online variational Bayes optimization problem for multivariate gaussian parametric distributions. By iterating the posterior through these fixed-point equations, we obtain an algorithm (FOO-VB) for continual learning that can handle nonstationary data distribution using a fixed architecture and without using external memory (i.e., without access to previous data). We demonstrate that our method (FOO-VB) outperforms existing methods in task-agnostic scenarios. FOO-VB Pytorch implementation is available at https://github.com/chenzeno/FOO-VB.

show abstract

Kernel and Rich Regimes in Overparametrized Models

Woodworth¹,

Gunasekar²,

Lee³

et al. 2020

Preprint

View full text Add to dashboard Cite

A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach [5], we show how the scale of the initialization controls the transition between the "kernel" (aka lazy) and "rich" (aka active) regimes and affects generalization properties in multilayer homogeneous models. We also highlight an interesting role for the width of a model in the case that the predictor is not identically zero at initialization. We provide a complete and detailed analysis for a family of simple depth-D models that already exhibit an interesting and meaningful transition between the kernel and rich regimes, and we also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.

show abstract

Kernel and Rich Regimes in Overparametrized Models

Woodworth¹,

Gunasekar²,

Savarese³

et al. 2019

Preprint

View full text Add to dashboard Cite

A recent line of work studies overparametrized neural networks in the "kernel regime," i.e. when the network behaves during training as a kernelized linear predictor, and thus training with gradient descent has the effect of finding the minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. Building on an observation by Chizat and Bach [4], we show how the scale of the initialization controls the transition between the "kernel" (aka lazy) and "deep" (aka active) regimes and affects generalization properties in multilayer homogeneous models. We provide a complete and detailed analysis for a simple two-layer model that already exhibits an interesting and meaningful transition between the kernel and deep regimes, and we demonstrate the transition for more complex matrix factorization models.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Itay Golan

Task Agnostic Continual Learning Using Online Variational Bayes

Task-Agnostic Continual Learning Using Online Variational Bayes with Fixed-Point Updates

Kernel and Rich Regimes in Overparametrized Models

Kernel and Rich Regimes in Overparametrized Models

Contact Info

Product

Resources

About