The 2013 International Joint Conference on Neural Networks (IJCNN) 2013
DOI: 10.1109/ijcnn.2013.6707047
|View full text |Cite
|
Sign up to set email alerts
|

Mitigation of catastrophic forgetting in recurrent neural networks using a Fixed Expansion Layer

Abstract: Abstract-Catastrophic forgetting (or catastrophic interference) in supervised learning systems is the drastic loss of previously stored information caused by the learning of new information. While substantial work has been published on addressing catastrophic forgetting in memoryless supervised learning systems (e.g. feedforward neural networks), the problem has received limited attention in the context of dynamic systems, particularly recurrent neural networks. In this paper, we introduce a solution for mitig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…Another reason why this heuristic is often used is that long sentences may be problematic when training an LSTM, because of the well-known exploding/vanishing gradient issue, but also because of the issue of catastrophic forgetting in neural networks. While gradient-clipping and linear pass-through gated connections are suitable solutions to somehow mitigate the former, the latter remains problematic in recurrent networks [7]. Conversely, truncating and padding may have undesired consequences.…”
Section: Systems Configurationmentioning
confidence: 99%
“…Another reason why this heuristic is often used is that long sentences may be problematic when training an LSTM, because of the well-known exploding/vanishing gradient issue, but also because of the issue of catastrophic forgetting in neural networks. While gradient-clipping and linear pass-through gated connections are suitable solutions to somehow mitigate the former, the latter remains problematic in recurrent networks [7]. Conversely, truncating and padding may have undesired consequences.…”
Section: Systems Configurationmentioning
confidence: 99%
“…The Fixed Expansion Layer [24] introduced the use of a large, sparse layer to disentangle the model activations. Subsequently, the Fixed Expansion Layer has been applied to recurrent models [25]. However, in order to build the sparse layer in an optimal way, the model requires to solve a quadratic optimization problem (feature-sign search algorithm) which can be problematic in real world problems (as we discuss in Section 6).…”
Section: Survey Of Continual Learning In Recurrent Modelsmentioning
confidence: 99%
“…Variants of SI have been used for different sequential datasets, but have not been systematically compared against other established methods [21,7,22]. Fixed expansion layers [23] are another method to limit the plasticity of weights and prevent forgetting, and in RNNs take the form of a sparsely activated layer between consecutive hidden states [24]. Lastly, some regularization approaches rely on the use of non-overlapping and orthogonal representations to overcome catastrophic forgetting [25,26,27].…”
Section: Related Workmentioning
confidence: 99%