2022
DOI: 10.48550/arxiv.2203.16329
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Parameter-efficient Model Adaptation for Vision Transformers

Abstract: In computer vision, it has achieved great success in adapting large-scale pretrained vision models (e.g., Vision Transformer) to downstream tasks via fine-tuning. Common approaches for fine-tuning either update all model parameters or leverage linear probes. In this paper, we aim to study parameter-efficient fine-tuning strategies for Vision Transformers on vision tasks. We formulate efficient fine-tuning as a subspace training problem and perform a comprehensive benchmarking over different efficient fine-tuni… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 32 publications
(70 reference statements)
0
7
0
Order By: Relevance
“…Their WRN-based approach achieved a remarkable accuracy of 99.17%, setting a new benchmark in efficiency and accuracy. Recently, other variants, including hierarchical ViTs with diverse resolutions and spatial embeddings [44], have been proposed. Without a doubt, the advancements in large ViTs underscore the importance of developing efficient model adaptation strategies.…”
Section: Related Workmentioning
confidence: 99%
“…Their WRN-based approach achieved a remarkable accuracy of 99.17%, setting a new benchmark in efficiency and accuracy. Recently, other variants, including hierarchical ViTs with diverse resolutions and spatial embeddings [44], have been proposed. Without a doubt, the advancements in large ViTs underscore the importance of developing efficient model adaptation strategies.…”
Section: Related Workmentioning
confidence: 99%
“…For VLP models, it provides a unique opportunity to leverage the text encoders for model adaptation, including Conditional Prompt Learning (Zhou et al, 2022b), Color Prompt Tuning (CPT) (Yao et al, 2021), VL-Adapter (Sung et al, 2022b), and CLIP Adapter (Gao et al, 2021). A comprehensive study on parameter efficiency can be found in He et al (2022). • Robustness.…”
Section: Advanced Topicsmentioning
confidence: 99%
“…Very recently, the Segmentation Anything Model (SAM) [21] has gained massive attention as a powerful and general vision segmentation model capable of generating various and fine-grained segmentation masks conditioned by the user prompt. Despite its strong performance over natural images, many recent studies studies have shown that Adaption can be easily adopted in various downstream computer vision tasks [4,17]. Therefore, we believe Adaption is the most fitting technique for carrying SAM to the medical domain.…”
Section: Introductionmentioning
confidence: 99%