2022
DOI: 10.1162/tacl_a_00517
|View full text |Cite
|
Sign up to set email alerts
|

Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

Abstract: Large pretrained language models (PLMs) are often domain- or task-adapted via finetuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and few examples but limits performance. Instead, we prepare PLMs for data- and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs. This difference is expressed in terms of model weights and sublayer structure through our proposed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(12 citation statements)
references
References 28 publications
0
12
0
Order By: Relevance
“…It exploits a low-dimensional manifold that can approximately represent the https://doi.org/10.1038/s42256-023-00626-4 whole model parameters, and the optimization trajectory follows this manifold. Some delta-tuning methods can be categorized into this approach, for example, LoRA 15 , BitFit 14 and diff pruning 44 . The other approach seeks a surrogate of the original objective function in a small functional subspace and uses the minimizer of the surrogate function as the approximate final solution.…”
Section: Optimization Perspectivementioning
confidence: 99%
See 2 more Smart Citations
“…It exploits a low-dimensional manifold that can approximately represent the https://doi.org/10.1038/s42256-023-00626-4 whole model parameters, and the optimization trajectory follows this manifold. Some delta-tuning methods can be categorized into this approach, for example, LoRA 15 , BitFit 14 and diff pruning 44 . The other approach seeks a surrogate of the original objective function in a small functional subspace and uses the minimizer of the surrogate function as the approximate final solution.…”
Section: Optimization Perspectivementioning
confidence: 99%
“…BitFit 14 updates the bias terms in PLMs while freezing the remaining modules. Low rank adaptation (LoRA) 15 decomposes attention weight update into low-rank matrices to reduce the number of trainable parameters. The delta-tuning methods enable efficient tuning and practical usage for large pre-trained models and often achieve comparable results to the standard fine-tuning.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…However, these foundational models have learned robust semantic representations of text. They can be fine-tuned for specific tasks with relatively few labeled examples (even as few as 100 or less) using techniques such as transfer learning 21 or meta-learning, 22 , 23 and still achieve state-of-the-art performance.…”
Section: Proposed Alternativementioning
confidence: 99%
“…By tapping into the abundance of unlabeled data, our model gains a robust understanding of diverse molecular structures. Remarkably, we are able to selectively fine-tune specific aspects of the model in a low-rank manner 57 , particularly when dealing with intricate molecular geometry configurations such as binding scenarios or conformers exhibiting high energy states. An additional rationale for favoring diffusion models in molecule optimization lies in their global optimization approach.…”
Section: Discussionmentioning
confidence: 99%