2022
DOI: 10.48550/arxiv.2206.06522
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…We present the local non-overlapping [6] and overlapping [67] self-attention for outdoor and indoor cases respectively, due to different illuminations and surroundings. Secondly, CasMTR enjoys flexible training by a novel Parameter and Memory-efficient Tuning method (PMT), which is originally derived for NLP tasks [44]. Essentially, PMT can incrementally finetune CasMTR based on off-the-shelf matching models with reliable coarse matching initialization and fast convergence.…”
Section: Methodsmentioning
confidence: 99%
“…We present the local non-overlapping [6] and overlapping [67] self-attention for outdoor and indoor cases respectively, due to different illuminations and surroundings. Secondly, CasMTR enjoys flexible training by a novel Parameter and Memory-efficient Tuning method (PMT), which is originally derived for NLP tasks [44]. Essentially, PMT can incrementally finetune CasMTR based on off-the-shelf matching models with reliable coarse matching initialization and fast convergence.…”
Section: Methodsmentioning
confidence: 99%
“…Zhang et al [70] search for the optimal configurations to combine multiple VPET approaches following once-for-all scheme [7,61]. Since the additional parameters require extra computations compared to full fine-tuning, a few recent works [53,55] design specific architectures to avoid storing the intermediate activations, thereby alleviating the fine-tuning memory cost. However, it is noteworthy that enhancing training efficiency is not the primary objective of our work.…”
Section: Related Workmentioning
confidence: 99%
“…Lester et al [23] proposed to prepend a trainable Tensor into original model input. Sung et al [17] introduced a novel Ladder-Side Tuning (LST) paradigm, which only finetunes a small Transformers network incorporated beside the original model. In this architecture design, the parameters of the newly incorporated networks are only updated to save the computation cost.…”
Section: Parameter-efficient Fine-tuningmentioning
confidence: 99%
“…Different form previous studies, our work introduces a novel approach by combining an additional CNN as a complementary Encoder within SAM architecture. Our approach draws inspiration from the Ladder-Side Tuning (LST) network for Transformers [17]. Our proposed approach enables the flexible integration of an additional network while avoiding backpropagation on the entire large model (i.e., SAM Encoder), leading to faster training and reduced resource costs.…”
Section: Introductionmentioning
confidence: 99%