AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Fu, Chin-Lun; Chen, Zih-Ching; Lee, Yun-Ru; Lee, Hung-yi

doi:10.18653/v1/2022.findings-naacl.199

Cited by 14 publications

(5 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bitfit (Ben Zaken, Goldberg, andRavfogel 2022) is a competitive method among them, where only the bias term of the model is modified during fine-tuning and other parameters are frozen. Based on this, AdapterBias (Fu et al 2022) is proposed, which assigns different representation shifts to task-related tokens according to the importance of tokens, so as to obtain better fine-tuning effects. Inspired by this advance, we investigate the value of the bias term for domain incremental learning, and introduce the concept of domain bias in our framework to achieve state-of-the-art performance.…”

Section: Bias Tuningmentioning

confidence: 99%

Non-exemplar Domain Incremental Object Detection via Learning Domain Bias

Song,

He,

Dong

et al. 2024

AAAI

View full text Add to dashboard Cite

Domain incremental object detection (DIOD) aims to gradually learn a unified object detection model from a dataset stream composed of different domains, achieving good performance in all encountered domains. The most critical obstacle to this goal is the catastrophic forgetting problem, where the performance of the model improves rapidly in new domains but deteriorates sharply in old ones after a few sessions. To address this problem, we propose a non-exemplar DIOD method named learning domain bias (LDB), which learns domain bias independently at each new session, avoiding saving examples from old domains. Concretely, a base model is first obtained through training during session 1. Then, LDB freezes the weights of the base model and trains individual domain bias for each new incoming domain, adapting the base model to the distribution of new domains. At test time, since the domain ID is unknown, we propose a domain selector based on nearest mean classifier (NMC), which selects the most appropriate domain bias for a test image. Extensive experimental evaluations on two series of datasets demonstrate the effectiveness of the proposed LDB method in achieving high accuracy on new and old domain datasets. The code is available at https://github.com/SONGX1997/LDB.

show abstract

Section: Bias Tuningmentioning

confidence: 99%

Non-exemplar Domain Incremental Object Detection via Learning Domain Bias

Song,

He,

Dong

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…In our experiments, we did a short pre-training of 10000 step with a learning rate equals to 4e −4 , after that all α j lower than 0.5 were pushed to 0 and the (Devlin et al, 2018); [b] are from (Fu et al, 2022); [c] are from (Glover and Hokamp, 2019); [d] are from (Pilault et al, 2021); † are results obtained with the best checkpoint in our settings.…”

Section: Implementation Detailsmentioning

confidence: 99%

Low-Rank Updates of pre-trained Weights for Multi-Task Learning

Audibert¹,

Amini²,

Usevich³

et al. 2023

Findings of the Association for Computational Linguistics: ACL 2023

View full text Add to dashboard Cite

Multi-Task Learning used with pre-trained models has been quite popular in the field of Natural Language Processing in recent years. This framework remains still challenging due to the complexity of the tasks and the challenges associated with fine-tuning large pre-trained models. In this paper, we propose a new approach for Multi-task learning which is based on stacking the weights of Neural Networks as a tensor. We show that low-rank updates in the canonical polyadic tensor decomposition of this tensor of weights lead to a simple, yet efficient algorithm, which without loss of performance allows to reduce considerably the model parameters. We investigate the interactions between tasks inside the model as well as the inclusion of sparsity to find the best tensor rank and to increase the compression rate. Our strategy is consistent with recent efforts that attempt to use constraints to fine-tune some model components. More precisely, we achieve equivalent performance as the state-of-the-art on the General Language Understanding Evaluation benchmark by training only 0.3% of the parameters per task while not modifying the baseline weights.

show abstract

“…More recently, LoRA [12] learns low-rank matrices for parameter updates approximation. AdapterBias [13] adds a token-dependent parameter shift to transfer from PLM in a more parameterefficient manner. Beyond its parameter efficiency, adapter tuning is also shown to be more robust due to its ability to preserve the pre-trained knowledge [14], and often exhibits robustness in out-of-distribution evaluation [5].…”

Section: Adapter Approachmentioning

confidence: 99%

“…AdapterBias [13] adds frame-dependent biases to the representation shifts by using a vector (v) and a linear layer (L α ). v represents the task-specific shift, and L α produces the weights (α) for input frames.…”

Section: Adapterbiasmentioning

confidence: 99%

See 1 more Smart Citation

Exploring Efficient-tuning Methods in Self-supervised Speech Models

Chen¹,

Fu²,

Chih-Ying³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning pre-trained models for each downstream task is parameterinefficient since SSL models are notoriously large with millions of parameters. Adapters are lightweight modules commonly used in NLP to solve this problem. In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained. Given the lack of studies generally exploring the effectiveness of adapters for self-supervised speech tasks, we intend to fill this gap by adding various adapter modules in pre-trained speech SSL models. We show that the performance parity can be achieved with over 90% parameter reduction, and discussed the pros and cons of efficient tuning techniques. This is the first comprehensive investigation of various adapter types across speech tasks.

show abstract

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Cited by 14 publications

References 0 publications

Non-exemplar Domain Incremental Object Detection via Learning Domain Bias

Non-exemplar Domain Incremental Object Detection via Learning Domain Bias

Low-Rank Updates of pre-trained Weights for Multi-Task Learning

Exploring Efficient-tuning Methods in Self-supervised Speech Models

Contact Info

Product

Resources

About