DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Douillard, Arthur; Ramé, Alexandre; Couairon, Guillaume; Cord, Matthieu

doi:10.1109/cvpr52688.2022.00907

Cited by 172 publications

(79 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The results in Table 1 compare Continual-with other baselines on CIFAR-100 dataset in three different settings (10, 20 and 50 steps). In 10 steps setting, even without any training or fine-tuning, Continual-CLIP achieves competitive results in terms average and last accuracy, compared with the recent state-of-the-art methods such as DyTox (Douillard et al, 2022) and DER (Yan et al, 2021). Specifically, In 20 steps setting, Continual-CLIP reaches 75.95% in "Avg" accuracy, and for the 50 steps setting, it reaches 76.49% in "Avg" accuracy.…”

Section: Resultsmentioning

confidence: 91%

“…For class-incremental settings, we evaluate Continual-CLIP on CIFAR-100, ImageNet-100 & 1K, TinyImageNet under different class splits. (a) In CIFAR-100, we compare the performance on 10 steps (10 new classes per step), 20 steps (5 new classes per step), and 50 steps (2 new classes per step) (Douillard et al, 2022;Yan et al, 2021). (b) In ImageNet-100, we consider two evaluation settings; ImageNet-100-B0 which has the same number of classes for all the steps (i.e., 10 classes per step) and ImageNet-100-B50 that contains 50 classes for the first step and the rest of the 50 classes are observed incrementally in the next 10 steps (5 classes per steps) (Yan et al, 2021).…”

Section: Experimental Protocolsmentioning

confidence: 99%

“…Implementation Details: We use the official CLIP (Radford et al, 2021) implementation in zeroshot evaluation settings. To build continual scenarios for class-incremental setting, we heavily used Continuum (Douillard & Lesort, 2021) and follow the same evaluation setting from Douillard et al (2022). For the domain-incremental scenarios on CORe50 and CLEAR datasets, we use the Avalanche library (Lomonaco et al, 2021).…”

Section: Experimental Protocolsmentioning

confidence: 99%

“…Several specialized methods have been developed in continual learning literature to reduce catastrophic forgetting. Among such methods, typical solutions offer sophisticated techniques involving memory replay (Rebuffi et al, 2017;Shin et al, 2017;Lopez-Paz & Ranzato, 2017), knowledge distillation (Hinton et al, 2015;Li & Hoiem, 2017), model regularization (Kirkpatrick et al, 2017), parameter isolation (Mallya & Lazebnik, 2018;Fernando et al, 2017), and dynamic network expansion (Yan et al, 2021;Douillard et al, 2022;. The resulting methods have a retraining cost at…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

CLIP model is an Efficient Continual Learner

Thengane¹,

Khan²,

Hayat³

et al. 2022

Preprint

View full text Add to dashboard Cite

The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such efforts, typical solutions offer sophisticated techniques involving memory replay, knowledge distillation, model regularization, and dynamic network expansion. The resulting methods have a retraining cost at each learning task, dedicated memory requirements, and setting-specific design choices. In this work, we show that a frozen CLIP (Contrastive Language-Image Pretraining) model offers astounding continual learning performance without any fine-tuning (zero-shot evaluation). We evaluate CLIP under a variety of settings including class-incremental, domain-incremental and task-agnostic incremental learning on five popular benchmarks (ImageNet-100 & 1K, CORe50, CIFAR-100, and TinyImageNet). Without any bells and whistles, the CLIP model outperforms the state-of-the-art continual learning approaches in majority of the settings. We show the effect on CLIP model's performance by varying text inputs with simple prompt templates. To the best of our knowledge, this is the first work to report the CLIP zero-shot performance in a continual setting. We advocate the use of this strong yet embarrassingly simple baseline for future comparisons in the continual learning tasks. Code is available at https://github.com/vgthengane/Continual-CLIP.

show abstract

Section: Resultsmentioning

confidence: 91%

Section: Experimental Protocolsmentioning

confidence: 99%

Section: Experimental Protocolsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

CLIP model is an Efficient Continual Learner

Thengane¹,

Khan²,

Hayat³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, including extra data into the current task introduces excessive training time (De Lange et al, 2021). Expansion-based methods (Yoon et al, 2017;2019;Douillard et al, 2022) dynamically allocate new parameters or modules to learn new tasks. While these methods face capacity explosion inevitably after learning a long sequence of tasks.…”

Section: Other Methodsmentioning

confidence: 99%

Restricted Orthogonal Gradient Projection for Continual Learning

Yang¹,

Yang²,

Liu³

2023

Preprint

View full text Add to dashboard Cite

Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches using a fixed network architecture. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.

show abstract

The role of lifelong machine learning in bridging the gap between human and machine learning: A scientometric analysis

Abulaish,

Wasi,

Sharma

2024

WIREs Data Min & Knowl

View full text Add to dashboard Cite

Due to advancements in data collection, storage, and processing techniques, machine learning has become a thriving and dominant paradigm. However, one of its main shortcomings is that the classical machine learning paradigm acts in isolation without utilizing the knowledge gained through learning from related tasks in the past. To circumvent this, the concept of Lifelong Machine Learning (LML) has been proposed, with the goal of mimicking how humans learn and acquire cognition. Human learning research has revealed that the brain connects previously learned information while learning new information from a single or small number of examples. Similarly, an LML system continually learns by storing and applying acquired information. Starting with an analysis of how the human brain learns, this paper shows that the LML framework shares a functional structure with the brain when it comes to solving new problems using previously learned information. It also provides a description of the LML framework, emphasizing its similarities to human brain learning. It also provides citation graph generation and scientometric analysis algorithms for the LML literatures, including information about the datasets and evaluation metrics that have been used in the empirical evaluation of LML systems. Finally, it presents outstanding issues and possible future research directions in the field of LML.This article is categorized under: Technologies > Machine Learning

show abstract

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Cited by 172 publications

References 18 publications

CLIP model is an Efficient Continual Learner

CLIP model is an Efficient Continual Learner

Restricted Orthogonal Gradient Projection for Continual Learning

The role of lifelong machine learning in bridging the gap between human and machine learning: A scientometric analysis

Contact Info

Product

Resources

About