2022
DOI: 10.48550/arxiv.2208.05592
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Patching open-vocabulary models by interpolating weights

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…This yields improved classification accuracy on IMAGENET. Along the same line, [24] fine-tune a model trained on IMAGENET on several new image classification datasets, and show that interpolating the original and fine-tuned parameters yields classifiers that perform well on all tasks.…”
Section: Related Workmentioning
confidence: 86%
See 2 more Smart Citations
“…This yields improved classification accuracy on IMAGENET. Along the same line, [24] fine-tune a model trained on IMAGENET on several new image classification datasets, and show that interpolating the original and fine-tuned parameters yields classifiers that perform well on all tasks.…”
Section: Related Workmentioning
confidence: 86%
“…In the following, we formally introduce the two main components of our procedure to merge adversarially robust models: (1) obtaining models which can interpolated by fine-tuning a single p -robust classifiers, and (2) interpolation of their weights to balance their different types of robustness. We highlight that our setup diverges from that of prior works about parameters averaging: in fact, both [24,46] combine models fine-tuned on the same task, i.e. achieving high classification accuracy of unperturbed images, either on a fixed dataset and different hyperparameter configurations [46], or varying datasets [24].…”
Section: Model Interpolation Across Different Tasksmentioning
confidence: 91%
See 1 more Smart Citation
“…Fine-Tuning Pre-Trained Representations. Several recent works study how to adapt pre-trained representations for downstream tasks (Kumar et al, 2022;Wortsman et al, 2022;Ilharco et al, 2022;Lee et al, 2022;Kirichenko et al, 2022;Goyal et al, 2022;Dong et al, 2022), motivated by the emergence of large pre-trained models (Radford et al, 2021;Brown et al, 2020) capable of zero-shot transfer. Most closely related to our work are a few concurrent works that find using the CLIP objective to fine-tune CLIP is more effective than alternative fine-tuning approaches (Goyal et al, 2022;Dong et al, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…However, we aim to study the fundamental question of how to finetune multi-modal (as opposed to uni-modal) models. A crucial difference between prior art and ours is the use of textual information, as all existing methods [41,100,111,113] repurpose additional text features as classifier weights instead of training samples. We demonstrate in this paper that crossmodal adaptation is not only more performant but can also benefit prior uni-modal approaches.…”
Section: Introductionmentioning
confidence: 99%