On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

He, Ruidan; Liu, Linlin; Hai, Yang; Tan, Qingyu; Ding, Bosheng; Cheng, Liying; Low, Jia-Wei; Si, Luo

doi:10.18653/v1/2021.acl-long.172

Cited by 70 publications

(75 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The RoBERTa-large MNLI results of our adapter implementation is on par with the recent stateof-the-art Compacter adapters on T5 (Mahabadi et al, 2021), but generalization in both BERT and RoBERTa is overall worse than with vanilla finetuning. Following on the recent report of adapter efficacy in low-resource setting (He et al, 2021), we conducted an additional experiment with adapters and RoBERTa-large, where the model had to learn from a small, more informative subsample. At 1024 training examples adapters performed better when the MNLI subsample was diverse (selected with K-means-based clustering, see appendix D) rather than randomly selected: 80.7% vs 85%.…”

Section: Negative Resultsmentioning

confidence: 99%

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Bhargava¹,

Drozd²,

Rogers³

2021

Proceedings of the Second Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics. We conduct a case study of generalization in NLI (from MNLI to the adversarially constructed HANS dataset) in a range of BERT-based architectures (adapters, Siamese Transformers, HEX debiasing), as well as with subsampling the data and increasing the model size. We report 2 successful and 3 unsuccessful strategies, all providing insights into how Transformer-based models learn to generalize.

show abstract

Section: Negative Resultsmentioning

confidence: 99%

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Bhargava¹,

Drozd²,

Rogers³

2021

Proceedings of the Second Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

show abstract

“…Adapter-tuning has shown to be on par with fine-tuning and sometimes exhibits better effectiveness in the low-resource setting (He et al, 2021). Later studies extend adapter-tuning to multi-lingual (Pfeiffer et al, 2021) andmulti-task (Karimi Mahabadi et al, 2021) settings, or further reduce the trainable parameters , which can be easily incorporated into UNIPELT as a replacement of the vanilla adapter-tuning.…”

Section: Pelt Methods W/ Additional Parametersmentioning

confidence: 99%

“…We conduct extensive experiments on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al, 2019), which involves four types of natural language understanding tasks including linguistic acceptability (CoLA), sentiment analysis (SST-2), similarity and paraphrase tasks (MRPC, STS-B, QQP), and natural language inference (MNLI, QNLI, RTE). WNLI is omitted following prior studies (Houlsby et al, 2019;Devlin et al, 2019;He et al, 2021;Ben Zaken et al, 2021) due to its adversarial nature. Data Setup.…”

Section: Experiments Setupmentioning

confidence: 99%

“…We sample a small subset of the training set for each task with size K = {100, 500, 1000}. As it is infeasible to submit a large number of runs to the GLUE leaderboard (2 submissions/day), we take 1,000 samples on the training set as the development set to select the best checkpoint and use the original development set as the test set following He et al (2021). Specifically, we randomly shuffle the training set with seed s, take the first K samples as the new training set, and the next 1,000 samples as the development set.…”

Section: Experiments Setupmentioning

confidence: 99%

“…A more challenging yet barely studied problem is whether one can achieve better performance than fine-tuning with fewer parameters. Recent studies (He et al, 2021;Li and Liang, 2021; find that some PELT methods could be more effective than fine-tuning when the training data is limited, possibly due to the reduced risk of overfitting. However, as found in our analytical experiments, various PELT methods may exhibit diverse characteristics and perform rather differently on the same task, which makes it nontrivial to select the most appropriate method for a specific task, especially considering the fast-growing number of new PELT methods as well as downstream tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Mao¹,

Mathias²,

Hou³

et al. 2021

Preprint

View full text Add to dashboard Cite

Conventional fine-tuning of pre-trained language models tunes all model parameters and stores a full model copy for each downstream task, which has become increasingly infeasible as the model size grows larger. Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when the training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-growing number of new PELT methods and downstream tasks. In light of model diversity and the difficulty of model selection, we propose a unified framework, UNIPELT, which incorporates different PELT methods as submodules and learns to activate the ones that best suit the current data or task setup. Remarkably, on the GLUE benchmark, UNIPELT consistently achieves 1~3pt gains compared to the best individual PELT method that it incorporates and even outperforms fine-tuning under different setups. Moreover, UNIPELT often surpasses the upper bound when taking the best performance of all its submodules used individually on each task, indicating that a mixture of multiple PELT methods may be inherently more effective than single methods. 1 * Work was done during internship at Facebook AI. 1 Work in progress.

show abstract

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

Sundaram,

Gurajada,

Padmanabhan

et al. 2024

WIREs Data Min & Knowl

View full text Add to dashboard Cite

From the latter half of the last decade, there has been a growing interest in developing algorithms for automatically solving mathematical word problems (MWP). It is a challenging and unique task that demands blending surface level text pattern recognition with mathematical reasoning. In spite of extensive research, we still have a lot to explore for building robust representations of elementary math word problems and effective solutions for the general task. In this paper, we critically examine the various models that have been developed for solving word problems, their pros and cons and the challenges ahead. In the last 2 years, a lot of deep learning models have recorded competing results on benchmark datasets, making a critical and conceptual analysis of literature highly useful at this juncture. We take a step back and analyze why, in spite of this abundance in scholarly interest, the predominantly used experiment and dataset designs continue to be a stumbling block. From the vantage point of having analyzed the literature closely, we also endeavor to provide a road‐map for future math word problem research.This article is categorized under: Technologies > Machine Learning Technologies > Artificial Intelligence Fundamental Concepts of Data and Knowledge > Knowledge Representation

show abstract

On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation

Cited by 70 publications

References 33 publications

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

Does a language model “understand” high school math? A survey of deep learning based word problem solvers

Contact Info

Product

Resources

About