Progressive Neural Networks

Rusu, Andrei; Rabinowitz, Neil C.; Desjardins, Guillaume; Soyer, Hubert; Kirkpatrick, James; Kavukcuoglu, Koray; Pascanu, Razvan; Hadsell, Raia

doi:10.48550/arxiv.1606.04671

Cited by 403 publications

(632 citation statements)

References 0 publications

Supporting

Mentioning

629

Contrasting

Order By: Relevance

“…The sample is first assigned to the proper task at inference time, and the corresponding model version is used. In (PNN) [Rusu et al, 2016], (DEN) [Yoon et al, 2018], and (RCL) [Xu and Zhu, 2018] new structural elements are added to the model for each new task, while in [Masse et al, 2018;Golkar et al, 2019;Wortsman et al, 2020] a large model is considered from which submodels are selected for subsequent tasks. Methods in this category exhibit high accuracy in a task incremental scenario when test samples are given with a corresponding task index [van de Ven and Tolias, 2019].…”

Section: Related Workmentioning

confidence: 99%

“…Continual learning (CL) is a machine learning domain that aims to mitigate catastrophic forgetting and enable models to be trained with an incoming stream of training data. This is usually achieved through regularization [Kirkpatrick et al, 2017], adaptation of model's architecture [Rusu et al, 2016] or replay of previous data examples. Typically, methods based on replay buffer achieve the best performance due to the high * Contact Author Figure 1: Cats are lazy and don't like to walk too much.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Logarithmic Continual Learning

Masarczyk¹,

Wawrzyński²,

Marczak³

et al. 2022

Preprint

View full text Add to dashboard Cite

We introduce a neural network architecture that logarithmically reduces the number of self-rehearsal steps in the generative rehearsal of continually learned models. In continual learning (CL), training samples come in subsequent tasks, and the trained model can access only a single task at a time. To replay previous samples, contemporary CL methods bootstrap generative models and train them recursively with a combination of current and regenerated past data. This recurrence leads to superfluous computations as the same past samples are regenerated after each task, and the reconstruction quality successively degrades. In this work, we address these limitations and propose a new generative rehearsal architecture that requires at most logarithmic number of retraining for each sample. Our approach leverages allocation of past data in a set of generative models such that most of them do not require retraining after a task. The experimental evaluation of our logarithmic continual learning approach shows the superiority of our method with respect to the state-of-the-art generative rehearsal methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Logarithmic Continual Learning

Masarczyk¹,

Wawrzyński²,

Marczak³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This property is best showcased in the work (Eysenbach et al, 2018), where they learn diverse skills without any reward function. Furthermore, sequential learning and the need to retain previously known skills has always been a focus (Rusu et al, 2016;Kirkpatrick et al, 2017). In the space of multi-task reinforcement learning with neural networks, Teh et al (2017) proposed a framework that allows sharing of knowledge across tasks via a task agnostic prior.…”

Section: Multi-task Reinforcement Learningmentioning

confidence: 99%

Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks

Ramnath¹,

Tristan²,

Bengio³

2022

Preprint

View full text Add to dashboard Cite

We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space. Our approach is grounded on the intuition that nothing makes you learn better than a coevolving adversary. The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation. We also adapt existing measures of causal attribution to draw insights from the skills learned. Our experiments demonstrate that the adversarial process leads to a better exploration of multiple solutions and understanding the minimum number of different skills necessary to solve a given set of tasks.

show abstract

“…According to the mechanism of memory consolidation, current approaches are categorized into three types: (i) Experiential rehearsal-based approaches, which focus on replaying episodic memory (Robins 1995), and the core of which is to select representative samples or features from historical data (Rebuffi et al 2017;Aljundi et al 2019;Bang et al 2021). (ii) Distributed memory representation approaches (Fernando et al 2017;Mallya and Lazebnik 2018), which allocate individual networks for specific knowledge to avoid interference, represented by Progressive Neural Networks (PNN) (Rusu et al 2016).…”

Section: Related Workmentioning

confidence: 99%

“…(Right): Similarly, the experiment was performed on CIFAR-10 for the first two tasks. The f eature transf er, i.e., Progressive Neural Networks (Rusu et al 2016), is a feature transfer-based method. Compared with w/o transf er, it transfers features from the model of previous tasks to the current task's learning.…”

Section: Introductionmentioning

confidence: 99%

Overcome Anterograde Forgetting with Cycled Memory Networks

Peng¹,

Ye²,

Tang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning from a sequence of tasks for a lifetime is essential for an agent towards artificial general intelligence. This requires the agent to continuously learn and memorize new knowledge without interference. This paper first demonstrates a fundamental issue of lifelong learning using neural networks, named anterograde forgetting, i.e., preserving and transferring memory may inhibit the learning of new knowledge. This is attributed to the fact that the learning capacity of a neural network will be reduced as it keeps memorizing historical knowledge, and the fact that the conceptual confusion may occur as it transfers irrelevant old knowledge to the current task. This work proposes a general framework named Cycled Memory Networks (CMN) to address the anterograde forgetting in neural networks for lifelong learning. The CMN consists of two individual memory networks to store shortterm and long-term memories to avoid capacity shrinkage. A transfer cell is designed to connect these two memory networks, enabling knowledge transfer from the long-term memory network to the short-term memory network to mitigate the conceptual confusion, and a memory consolidation mechanism is developed to integrate short-term knowledge into the long-term memory network for knowledge accumulation. Experimental results demonstrate that the CMN can effectively address the anterograde forgetting on several task-related, task-conflict, class-incremental and crossdomain benchmarks.

show abstract

Progressive Neural Networks

Cited by 403 publications

References 0 publications

Logarithmic Continual Learning

Logarithmic Continual Learning

Boosting Exploration in Multi-Task Reinforcement Learning using Adversarial Networks

Overcome Anterograde Forgetting with Cycled Memory Networks

Contact Info

Product

Resources

About