Continual Learning for Text Classification with Information Disentanglement Based Regularization

Huang, Yufan; Zhang, Yanzhe; Chen, Jiaao; Wang, Xuezhi; Yang, Diyi

doi:10.18653/v1/2021.naacl-main.218

Cited by 36 publications

(45 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sentence Embedding Alignment (Wang et al, 2019) stores sentences and their representations and tries to ensure a simple linear mapping that maps from old representations to new representations given a batch of sentences. Huang et al (2021) propose an information disentanglement and regularization-based approach which disentangles and applies separate regularization to task-agnostic and task-specific representations. Model regularization-based approaches perform regularization directly in the weight space.…”

Section: Continual Learning Algorithms In Nlpmentioning

confidence: 99%

“…Model expansion-based approaches separate task-specific parameters from irrelevant ones and freeze shared parameters to prevent catastrophic forgetting. While sometimes not explicitly studied, Adapter-based approaches (Wang et al, 2021) could be applied to continual learning. The algorithms learn a single adapter per task without interference with pretrained weights or other tasks; at the same time, knowledge captured in previous tasks can be effectively fused to new tasks (Pfeiffer et al, 2021).…”

Section: Continual Learning Algorithms In Nlpmentioning

confidence: 99%

“…Adapter-based approaches add small "adapter" layers between layers of transformers per task (Wang et al, 2021;Houlsby et al, 2019). We follow the adapter design by Pfeiffer et al (2021).…”

Section: Adapter-based Approachesmentioning

confidence: 99%

See 2 more Smart Citations

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Jin¹,

Zhang²,

Zhu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data from a new domain that deviates from what the PTLM was initially trained on, or newly emerged data that contains out-of-distribution information. In this paper, we study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data. Over a domain-incremental research paper stream and a chronologicallyordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms, and keep track of the downstream task performance (after fine-tuning) to analyze its ability of acquiring new knowledge and preserving learned knowledge. Our experiments show continual learning algorithms improve knowledge preservation, with logit distillation being the most effective approach. We further show that continual pretraining improves generalization when training and testing data of downstream tasks are drawn from different time steps, but do not improve when they are from the same time steps. We believe our problem formulation, methods, and analysis will inspire future studies towards continual pretraining of language models.

show abstract

Section: Continual Learning Algorithms In Nlpmentioning

confidence: 99%

Section: Continual Learning Algorithms In Nlpmentioning

confidence: 99%

See 1 more Smart Citation

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Jin¹,

Zhang²,

Zhu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Generally, existing CL methods encompass memory and generative replaybased approaches (Robins, 1995;Lopez-Paz and Ranzato, 2017;Shin et al, 2017), regularization based approaches (Kirkpatrick et al, 2017;Nguyen et al, 2018) and model expansion based approaches (Shin et al, 2017). Recently, continual learning has drawn attention in the NLP field (Sun et al, 2020;Wang et al, 2019b;Huang et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning

Jin¹,

Lin²,

Rostami³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

The ability to continuously expand knowledge over time and utilize it to rapidly generalize to new tasks is a key feature of human linguistic intelligence. Existing models that pursue rapid generalization to new tasks (e.g., fewshot learning methods), however, are mostly trained in a single shot on fixed datasets, unable to dynamically expand their knowledge; while continual learning algorithms are not specifically designed for rapid generalization. We present a new learning setup, Continual Learning of Few-Shot Learners (CLIF), to address the challenges of both learning settings in a unified setup. CLIF assumes a model learns from a sequence of diverse NLP tasks arriving sequentially, accumulating knowledge for improved generalization to new tasks, while also retaining performance on the tasks learned earlier. We examine how the generalization ability is affected in the continual learning setup, evaluate a number of continual learning algorithms, and propose a novel regularized adapter generation approach. We find that catastrophic forgetting affects generalization ability to a lesser degree than performance on seen tasks; while continual learning algorithms can still bring considerable benefit to the generalization ability 1 .

show abstract

“…Both random selection and kcenter methods utilise heuristics to update the memory. EA-EMR [5] and IDBR [18] selected informative samples by referencing the centroid of the cluster via K-Means. iCaRL [19] chose samples, that are nearest to the mean of the distribution.…”

Section: ) Sample Selection Schemesmentioning

confidence: 99%

Prototype-Guided Memory Replay for Continual Learning

Ho¹,

Liu²,

Du³

et al. 2021

Preprint

View full text Add to dashboard Cite

Continual learning (CL) refers to a machine learning paradigm that using only a small account of training samples and previously learned knowledge to enhance learning performance. CL models learn tasks from various domains in a sequential manner. The major difficulty in CL is catastrophic forgetting of previously learned tasks, caused by shifts in data distributions. The existing CL models often employ a replaybased approach to diminish catastrophic forgetting. Most CL models stochastically select previously seen samples to retain learned knowledge. However, occupied memory size keeps enlarging along with accumulating learned tasks. Hereby, we propose a memory-efficient CL method. We devise a dynamic prototypesguided memory replay module, incorporating it into an online meta-learning model. We conduct extensive experiments on text classification and additionally investigate the effect of training set orders on CL model performance. The experimental results testify the superiority of our method in alleviating catastrophic forgetting and enabling efficient knowledge transfer.

show abstract

Continual Learning for Text Classification with Information Disentanglement Based Regularization

Cited by 36 publications

References 28 publications

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning

Prototype-Guided Memory Replay for Continual Learning

Contact Info

Product

Resources

About