2021
DOI: 10.48550/arxiv.2102.01386
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

Abstract: With the rapid adoption of machine learning (ML), a number of domains now use the approach of fine-tuning models pre-trained on a large corpus of data. However, our experiments show that even fine-tuning on models like BERT can take many hours when using GPUs. While prior work proposes limiting the number of layers that are fine-tuned, e.g., freezing all layers but the last layer, we find that such static approaches lead to reduced accuracy. We propose, AutoFreeze, a system that uses an adaptive approach to ch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(17 citation statements)
references
References 62 publications
1
16
0
Order By: Relevance
“…Radiya-Dixit and Wang [20] showed that it suffices to fine-tune only the most critical layers. Similar results can be found in [15,22]. While a partial FT can be advantageous over a full FT, they both require the pre-trained BERT to be modified.…”
Section: Lightweight Fine-tuning (Lft)supporting
confidence: 72%
“…Radiya-Dixit and Wang [20] showed that it suffices to fine-tune only the most critical layers. Similar results can be found in [15,22]. While a partial FT can be advantageous over a full FT, they both require the pre-trained BERT to be modified.…”
Section: Lightweight Fine-tuning (Lft)supporting
confidence: 72%
“…Shen et al [45] observe that not all pre-trained knowledge is necessarily beneficial for few-shot learning tasks, and hence propose to only transfer partial knowledge from the pre-trained model by selectively freezing some layers and fine-tuning others. SpotTune [46] and AutoFreeze [47] are adaptive fine-tuning approaches that also freeze some layers while adapting others, resulting in improved performance on computer vision and language modeling respectively. Recently, Lee et al [25] show that surgical fine-tuning (i.e., fine-tuning only a subset of layers) can yield better results than end-to-end fine-tuning when adapting to various distribution shifts.…”
Section: B Transfer Learningmentioning
confidence: 99%
“…These methods generally include two categories according to whether new trainable parameters are introduced. One category is that only a subset of model parameters can be updated while freezing the remain (Liu et al, 2021b;Lee et al, 2019). The other is introducing a few task-specific new parameters to different parts of pretrained models, such as before multi-head attention (Li & Liang, 2021), after feedforward layers (Houlsby et al, 2019) or Mixed-and-Match methods (MAM adapter) proposed by He et al (2021).…”
Section: Parameter-efficient Tuningmentioning
confidence: 99%
“…To mitigate these issues, there has recently been one line of research on Parameter-Efficient Language model Tuning (PELT). A few lightweight transfer learning methods have been proposed and they only update a subset of model parameters while freeze the remaining most parameters (Liu et al, 2021b). Extra trainable task-specific model parameters can also be newly introduced to PLMs, such as the widely used adapter-tuning (Houlsby et al, 2019) and prefixtuning (Li & Liang, 2021) methods.…”
Section: Introductionmentioning
confidence: 99%