Continual Learning with Lifelong Vision Transformer

Wang, Zhen; Liu, Liu; Duan, Yiqun; Kong, Yajing; Tao, Dacheng

doi:10.1109/cvpr52688.2022.00027

Cited by 53 publications

(28 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dytox [58] dynamically learns new task tokens, which are then utilized to make the learned embeddings more relevant to the specific task. Lifelong ViT [59] and contrastive ViT [60] introduce crossattention mechanisms between tasks through external key vectors, and they slow down the changes to these keys to mitigate forgetting. Despite the use of complex mechanisms to prevent forgetting, these methods still require fine-tuning of the network for new classes, which can result in interference with previously learned knowledge.…”

Section: Self-supervisedmentioning

confidence: 99%

Representation Compensation Networks for Continual Semantic Segmentation

Zhang

Xiao

Liu

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduce SegViTv2. In our work, we implement the decoder with the global attention mechanism inherent in ViT backbones and propose the light-weight Attentionto-Mask (ATM) module that effectively converts the global attention map into semantic masks for highquality segmentation results. Our decoder can outperform the most commonly-used decoder UpperNet in various ViT backbones while consuming only about 5% of the computational cost. For the encoder, we address the concern of the relatively high computational cost in the ViT-based encoders and propose a Shrunk++ structure that incorporates edge-aware query-based down-sampling (EQD) and query-based up-sampling (QU) modules. The Shrunk++ structure reduces the computational cost of the encoder by up to 50% while maintaining competitive performance. Furthermore, due to the flexibility of our ViT-based architecture, SegVit can be easily extended to semantic segmentation under the setting of continual learning, achieving nearly zero forgetting. Experiments show that our proposed SegViT outperforms recent segmentation methods on three popular benchmarks including ADE20k, COCO-Stuff-10k and PASCAL-Context

show abstract

Section: Self-supervisedmentioning

confidence: 99%

Representation Compensation Networks for Continual Semantic Segmentation

Zhang

Xiao

Liu

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Continual Learning: CL approaches can be broadly classified into -1) Exemplar-replay methods, 2) regularization methods and 3) dynamic architecture methods. To avoid forgetting when learning a new task, replay approaches repeat past task samples that are kept in raw format [2,4,6,7,9,16,25,40,52] or generated with a generative model [45]. Usually, replay-based approaches have a fixed memory which stores samples.…”

Section: Related Workmentioning

confidence: 99%

“…Recently Dytox [16] proposed to learn new tasks through the expansion of special tokens known as task tokens. Another recent approach, LVT [52], proposed an inter-task attention mechanism that absorbs the previous tasks' information and slows down the drift of information between previous and current tasks. Both Dytox and LVT require extra memory for stor- ing training instances from previous tasks.…”

Section: Related Workmentioning

confidence: 99%

“…Extensive experiments on four benchmark datasets, considering both the availability and unavailability of task-ids at test time, demonstrate the superiority of our proposed method over several state-of-the-art methods including the extension of popular approaches on CNNs to transformers. Despite being exemplar-free, our approach outperforms stateof-the-art exemplar-based continual learning approaches that use transformers as backbone architectures (Dytox [16] and LVT [52]) with an average improvement of 5%. Moreover, our approach accomplishes this with only about ∼ 60% parameters used by above models.…”

Section: Introductionmentioning

confidence: 95%

“…Early continual learning approaches on ConvNets relied on exemplar rehearsal which re-trains newer models on previous data instances stored in a fixed size memory buffer [2,7,9,23,33,38,40,41]. Dytox [16] and LVT [52] are contemporary works addressing continual learning on vision transformers using such previously stored data. However, storing task samples in raw format may not always be feasible, especially for tasks where long-term storage of data is not permitted owing to privacy or data use legislation [24].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Users’ Perceptions toward Library and Library Services of IISER Kolkata

Roy

Mandal

2020

IJISS

View full text Add to dashboard Cite

This study aims to explore the perceptions of the users related to the library and library services of IISER Kolkata library. Well designed questionnaires distributed among 1480 users PhD Scholars, Integrated PhD Scholars, MS by research students and Integrated BS-MS students and returned 1105 filled questionnaires. It is observed that a large number of users satisfied with features under library services through physically visiting of library and print and online library resources except few. The features of library services through online dissatisfied to the users. Most of the users highly interested in the resources of biological, chemical, earth, mathematics & statistics and physical sciences.

show abstract