Mitigation of catastrophic forgetting in recurrent neural networks using a Fixed Expansion Layer

Coop, Robert; Arel, Itamar

doi:10.1109/ijcnn.2013.6707047

Cited by 15 publications

(8 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another reason why this heuristic is often used is that long sentences may be problematic when training an LSTM, because of the well-known exploding/vanishing gradient issue, but also because of the issue of catastrophic forgetting in neural networks. While gradient-clipping and linear pass-through gated connections are suitable solutions to somehow mitigate the former, the latter remains problematic in recurrent networks [7]. Conversely, truncating and padding may have undesired consequences.…”

Section: Systems Configurationmentioning

confidence: 99%

On the effects of using word2vec representations in neural networks for dialogue act recognition

Cerisara

Král

Lenc

2018

Computer Speech & Language

View full text Add to dashboard Cite

Dialogue act recognition is an important component of a large number of natural language processing pipelines. Many research works have been carried out in this area, but relatively few investigate deep neural networks and word embeddings. This is surprising, given that both of these techniques have proven exceptionally good in most other language-related domains. We propose in this work a new deep neural network that explores recurrent models to capture word sequences within sentences, and further study the impact of pretrained word embeddings. We validate this model on three languages: English, French and Czech. The performance of the proposed approach is consistent across these languages and it is comparable to the state-of-the-art results in English. More importantly, we confirm that deep neural networks indeed outperform a Maximum Entropy classifier, which was expected. However, and this is more surprising, we also found that standard word2vec embeddings do not seem to bring valuable information for this task and the proposed model, whatever the size of the training corpus is. We thus further analyse the resulting embeddings and conclude that a possible explanation may be related to the mismatch between the type of lexical-semantic information captured by the word2vec embeddings, and the kind of relations

show abstract

Section: Systems Configurationmentioning

confidence: 99%

On the effects of using word2vec representations in neural networks for dialogue act recognition

Cerisara

Král

Lenc

2018

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…The Fixed Expansion Layer [24] introduced the use of a large, sparse layer to disentangle the model activations. Subsequently, the Fixed Expansion Layer has been applied to recurrent models [25]. However, in order to build the sparse layer in an optimal way, the model requires to solve a quadratic optimization problem (feature-sign search algorithm) which can be problematic in real world problems (as we discuss in Section 6).…”

Section: Survey Of Continual Learning In Recurrent Modelsmentioning

confidence: 99%

Continual Learning for Recurrent Neural Networks: an Empirical Evaluation

Cossu,

Carta,

Lomonaco

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning continuously during all model lifetime is fundamental to deploy machine learning solutions robust to drifts in the data distribution. Advances in Continual Learning (CL) with recurrent neural networks could pave the way to a large number of applications where incoming data is non stationary, like natural language processing and robotics. However, the existing body of work on the topic is still fragmented, with approaches which are application-specific and whose assessment is based on heterogeneous learning protocols and datasets. In this paper, we organize the literature on CL for sequential data processing by providing a categorization of the contributions and a review of the benchmarks. We propose two new benchmarks for CL with sequential data based on existing datasets, whose characteristics resemble real-world applications. We also provide a broad empirical evaluation of CL and Recurrent Neural Networks in class-incremental scenario, by testing their ability to mitigate forgetting with a number of different strategies which are not specific to sequential data processing. Our results highlight the key role played by the sequence length and the importance of a clear specification of the CL scenario.

show abstract

“…Variants of SI have been used for different sequential datasets, but have not been systematically compared against other established methods [21,7,22]. Fixed expansion layers [23] are another method to limit the plasticity of weights and prevent forgetting, and in RNNs take the form of a sparsely activated layer between consecutive hidden states [24]. Lastly, some regularization approaches rely on the use of non-overlapping and orthogonal representations to overcome catastrophic forgetting [25,26,27].…”

Section: Related Workmentioning

confidence: 99%

Continual Learning in Recurrent Neural Networks

Ehret,

Henning,

Cervera

et al. 2020

Preprint

View full text Add to dashboard Cite

The last decade has seen a surge of interest in continual learning (CL), and a variety of methods have been developed to alleviate catastrophic forgetting. However, most prior work has focused on tasks with static data, while CL on sequential data has remained largely unexplored. Here we address this gap in two ways. First, we evaluate the performance of established CL methods when applied to recurrent neural networks (RNNs). We primarily focus on elastic weight consolidation, which is limited by a stability-plasticity trade-off, and explore the particularities of this trade-off when using sequential data. We show that high working memory requirements, but not necessarily sequence length, lead to an increased need for stability at the cost of decreased performance on subsequent tasks. Second, to overcome this limitation we employ a recent method based on hypernetworks and apply it to RNNs to address catastrophic forgetting on sequential data. By generating the weights of a main RNN in a task-dependent manner, our approach disentangles stability and plasticity, and outperforms alternative methods in a range of experiments. Overall, our work provides several key insights on the differences between CL in feedforward networks and in RNNs, while offering a novel solution to effectively tackle CL on sequential data.Preprint. Under review.

show abstract

Mitigation of catastrophic forgetting in recurrent neural networks using a Fixed Expansion Layer

Cited by 15 publications

References 15 publications

On the effects of using word2vec representations in neural networks for dialogue act recognition

On the effects of using word2vec representations in neural networks for dialogue act recognition

Continual Learning for Recurrent Neural Networks: an Empirical Evaluation

Continual Learning in Recurrent Neural Networks

Contact Info

Product

Resources

About