2022
DOI: 10.1613/jair.1.13689
|View full text |Cite
|
Sign up to set email alerts
|

CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

Abstract: This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision. Given a pre-trained multimodal embedding model, where language and images are projected in the same semantic space (in this case CLIP by OpenAI), CoLLIE learns a transformation function that adjusts the language embeddings when needed to accommodate new language use. This is done by predicting the difference vector that needs to be applied, as well as a scaling factor for this vector, so that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 28 publications
0
5
0
Order By: Relevance
“…Numerous CL algorithms have been developed in the literature (Srivastava et al, 2013;Farquhar & Gal, 2018;Kim et al, 2018;Morgado & Vasconcelos, 2019;Benavides-Prado et al, 2020;Parisi & Lomonaco, 2020;Ahn et al, 2021;Ayub & Wagner, 2021;Cha et al, 2021a;Derakhshani et al, 2021;Ehret et al, 2021;Hurtado et al, 2021;Kapoor et al, 2021;Mao et al, 2021;Tang & Matteson, 2021;Yoon et al, 2021;Benavides-Prado & Riddle, 2022;Madaan et al, 2022;Ramesh & Chaudhari, 2022;Romero et al, 2022a;Skantze & Willemsen, 2022;Wang et al, 2022b;Gaya et al, 2023;Mundt et al, 2023). On a high level, there are three principal approaches to continual learning (CL): memory-based, regularization-based and architecture-based (Pan et al, 2020;Parisi & Lomonaco, 2020;Krishnan & Balaprakash, 2021;Mehta et al, 2021;.…”
Section: Related Workmentioning
confidence: 99%
“…Numerous CL algorithms have been developed in the literature (Srivastava et al, 2013;Farquhar & Gal, 2018;Kim et al, 2018;Morgado & Vasconcelos, 2019;Benavides-Prado et al, 2020;Parisi & Lomonaco, 2020;Ahn et al, 2021;Ayub & Wagner, 2021;Cha et al, 2021a;Derakhshani et al, 2021;Ehret et al, 2021;Hurtado et al, 2021;Kapoor et al, 2021;Mao et al, 2021;Tang & Matteson, 2021;Yoon et al, 2021;Benavides-Prado & Riddle, 2022;Madaan et al, 2022;Ramesh & Chaudhari, 2022;Romero et al, 2022a;Skantze & Willemsen, 2022;Wang et al, 2022b;Gaya et al, 2023;Mundt et al, 2023). On a high level, there are three principal approaches to continual learning (CL): memory-based, regularization-based and architecture-based (Pan et al, 2020;Parisi & Lomonaco, 2020;Krishnan & Balaprakash, 2021;Mehta et al, 2021;.…”
Section: Related Workmentioning
confidence: 99%
“…Understanding which model components (here referred to as modules) are suited for taskspecialization requires insights on their role in solving each task, which prior works on continual Vision-Language (VL) grounding fail to explain. At the same time, there is a lack of suitable benchmarks that allow for a fine-grained model analysis, as existing CL scenarios for language grounding based on synthetic images are too simplistic concerning the CL problem (e.g., only one distributional shift) (Greco et al, 2019) or the VL grounding problem (e.g., single-object, trivial language) (Skantze and Willemsen, 2022), while those based on real-world images (Srinivasan et al, 2022;Jin et al, 2020) may encourage models to take shortcuts in the vision-language reasoning process, which leaves the generalizability of statements derived from any model analysis questionable. To this end, we introduce the LIfelong LAnguage Compositions (LILAC) benchmark suite that comprises two diagnostic VL datasets that allow for investigating the continual learning behavior of the models with a high degree of control and flexibility while being challenging enough to require object localization, spatial reasoning, concept learning, and language grounding capabilities of the continual learner.…”
Section: Training From Scratchmentioning
confidence: 99%
“…A few works explore the intersection of continual learning and visually grounded language learning with diagnostic (Skantze and Willemsen, 2022;Greco et al, 2019) and real-world (Srinivasan et al, 2022;Jin et al, 2020) datasets. All works conclude that common CL baselines struggle with striking a balance between forgetting and cross-task knowledge transfer, yet do not provide any insights on how this struggle is connected with the learning behaviors of the architecture used.…”
Section: Related Workmentioning
confidence: 99%
“…Re-training a model with an expanded dataset for each new concept is prohibitively expensive, and fine-tuning on few examples typically leads to catastrophic forgetting (Ding et al, 2022;. More measured approaches freeze the model and train transformation modules to adapt its output when faced with new concepts Gao et al, 2021;Skantze & Willemsen, 2022). However, these approaches are still prone to forgetting prior knowledge, or face difficulties in accessing it concurrently with newly learned concepts (Kumar et al, 2022;.…”
Section: Introductionmentioning
confidence: 99%