AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

Liu, Yuhan; Agarwal, Saurabh; Venkataraman, Shivaram

doi:10.48550/arxiv.2102.01386

Cited by 10 publications

(17 citation statements)

References 62 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Radiya-Dixit and Wang [20] showed that it suffices to fine-tune only the most critical layers. Similar results can be found in [15,22]. While a partial FT can be advantageous over a full FT, they both require the pre-trained BERT to be modified.…”

Section: Lightweight Fine-tuning (Lft)supporting

confidence: 72%

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Jung,

Choi,

Rhee

2021

Preprint

View full text Add to dashboard Cite

A BERT-based Neural Ranking Model (NRM) can be either a crossencoder or a bi-encoder. Between the two, bi-encoder is highly efficient because all the documents can be pre-processed before the actual query time. Although query and document are independently encoded, the existing bi-encoder NRMs are Siamese models where a single language model is used for consistently encoding both of query and document. In this work, we show two approaches for improving the performance of BERT-based bi-encoders. The first approach is to replace the full fine-tuning step with a lightweight fine-tuning. We examine lightweight fine-tuning methods that are adapter-based, prompt-based, and hybrid of the two. The second approach is to develop semi-Siamese models where queries and documents are handled with a limited amount of difference. The limited difference is realized by learning two lightweight fine-tuning modules, where the main language model of BERT is kept common for both query and document. We provide extensive experiment results for monoBERT, TwinBERT, and ColBERT where three performance metrics are evaluated over Robust04, ClueWeb09b, and MS-MARCO datasets. The results confirm that both lightweight fine-tuning and semi-Siamese are considerably helpful for improving BERT-based bi-encoders. In fact, lightweight fine-tuning is helpful for crossencoder, too. 1

show abstract

Section: Lightweight Fine-tuning (Lft)supporting

confidence: 72%

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Jung,

Choi,

Rhee

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Shen et al [45] observe that not all pre-trained knowledge is necessarily beneficial for few-shot learning tasks, and hence propose to only transfer partial knowledge from the pre-trained model by selectively freezing some layers and fine-tuning others. SpotTune [46] and AutoFreeze [47] are adaptive fine-tuning approaches that also freeze some layers while adapting others, resulting in improved performance on computer vision and language modeling respectively. Recently, Lee et al [25] show that surgical fine-tuning (i.e., fine-tuning only a subset of layers) can yield better results than end-to-end fine-tuning when adapting to various distribution shifts.…”

Section: B Transfer Learningmentioning

confidence: 99%

Improving fine-tuning of self-supervised models with Contrastive Initialization

et al. 2023

View full text Add to dashboard Cite

“…These methods generally include two categories according to whether new trainable parameters are introduced. One category is that only a subset of model parameters can be updated while freezing the remain (Liu et al, 2021b;Lee et al, 2019). The other is introducing a few task-specific new parameters to different parts of pretrained models, such as before multi-head attention (Li & Liang, 2021), after feedforward layers (Houlsby et al, 2019) or Mixed-and-Match methods (MAM adapter) proposed by He et al (2021).…”

Section: Parameter-efficient Tuningmentioning

confidence: 99%

“…To mitigate these issues, there has recently been one line of research on Parameter-Efficient Language model Tuning (PELT). A few lightweight transfer learning methods have been proposed and they only update a subset of model parameters while freeze the remaining most parameters (Liu et al, 2021b). Extra trainable task-specific model parameters can also be newly introduced to PLMs, such as the widely used adapter-tuning (Houlsby et al, 2019) and prefixtuning (Li & Liang, 2021) methods.…”

Section: Introductionmentioning

confidence: 99%

HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks

Zhang¹,

Guo²,

Meng³

et al. 2022

Preprint

View full text Add to dashboard Cite

The workflow of pretraining and fine-tuning has emerged as a popular paradigm for solving various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of pretrained models growing rapidly, how to perform parameterefficient fine-tuning has become fairly important for quick transfer learning and deployment. In this paper, we design a novel unified parameterefficient transfer learning framework that works effectively on both pure language and V&L tasks. In particular, we use a shared hypernetwork that takes trainable hyper-embeddings as input, and outputs weights for fine-tuning different small modules in a pretrained language model, such as tuning the parameters inserted into multi-head attention blocks (i.e., prefix-tuning) and feedforward blocks (i.e., adapter-tuning). We define a set of embeddings (e.g., layer, block, task and visual embeddings) as the key components to calculate hyper-embeddings, which thus can support both pure language and V&L tasks. Our proposed framework adds fewer trainable parameters in multi-task learning while achieves superior performances and transfer ability compared to state-of-the-art methods. Empirical results on the GLUE benchmark and multiple V&L tasks confirm the effectiveness of our framework on both textual and visual modalities. 1 * Equal contribution 1 Work is done at the internship of Noah's Ark Lab, Huawei Technologies.

show abstract

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

Cited by 10 publications

References 62 publications

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning

Improving fine-tuning of self-supervised models with Contrastive Initialization

HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks

Contact Info

Product

Resources

About