P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

Liu, Xiao; Ji, Kaixuan; Fu, Yicheng; Tam, Weng; Du, Zhengxiao; Yang, Zhou; Tang, Jie

doi:10.18653/v1/2022.acl-short.8

Cited by 399 publications

(210 citation statements)

References 0 publications

Supporting

Mentioning

210

Contrasting

Order By: Relevance

“…In this section, we show that DEEPSTRUCT successfully transfers to the structure prediction tasks considered and obtain state-of-the-art results on 21 of 28 datasets we evaluate. All results are obtained via structure pretraining a pretrained 10B parameter LM, GLM (Du et al, 2021). The details of the experimental setup, datasets, and comparison methods are described in Appendix A.…”

Section: Methodsmentioning

confidence: 99%

“…We adopt the pretrained LMs from the (Du et al, 2021), whose energy cost and carbon footprint during pretraining were 80.6 MWh and 4.6 tCO2e, respectively. Additionally, the structure pretraining takes less than 5% gradient-steps of the number of pretraining steps of LMs, and thus the estimated auxiliary cost for energy is comparatively smaller.…”

Section: Environmental Considerationsmentioning

confidence: 99%

See 1 more Smart Citation

DeepStruct: Pretraining of Language Models for Structure Prediction

Wang¹,

Liu²,

Chen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We introduce a method for improving the structural understanding abilities of language models. Unlike previous approaches that finetune the models with task-specific augmentation, we pretrain language models on a collection of task-agnostic corpora to generate structures from text. Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks. We study the performance of this approach on 28 datasets, spanning 10 structure prediction tasks including open information extraction, joint entity and relation extraction, named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, factual probe, intent detection, and dialogue state tracking. We further enhance the pretraining with the task-specific training sets. We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets that we evaluate. 1

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Environmental Considerationsmentioning

confidence: 99%

DeepStruct: Pretraining of Language Models for Structure Prediction

Wang¹,

Liu²,

Chen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since the summer of 2021 there has been a steady influx of research papers concerning prompt learning for common benchmarking open-NLP datasets such as Stanford Sentiment Treebank v2 (SST2), and the General Language Understanding Evaluation (GLUE) Liu et al [2021a], Brown et al [2020b], Sanh et al [2022], Lester et al [2021], Liu et al [2021b], Li and Liang [2021]. The datasets and tasks are standard in the field of NLP, and revolve around natural language understanding (NLU) tasks.…”

Section: Related Workmentioning

confidence: 99%

“…The common finding is that prompt learning can reach the performance of traditional fine-tuning, and often outperform in few-shot settings. Although the ability of prompt learning to match performance of traditional fine-tuning seems to scale with PLM size Liu et al [2021b]. One notable paper has investigated the use of GPT-3 for biomedical text datasets in a few-shot setting, finding a decrease in performance when compared to similar tasks in the standard NLU datasets Moradi et al [2021].…”

Section: Related Workmentioning

confidence: 99%

Clinical Prompt Learning with Frozen Language Models

Taylor¹,

Zhang²,

Joyce³

et al. 2022

Preprint

View full text Add to dashboard Cite

Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown impressive performance on a number of natural language tasks with common benchmarking text datasets in full, few-shot, and zero-shot train-evaluation setups. Recently, it has even been observed that large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models. However, as with many recent NLP trends, the performance of even the largest PLMs such as GPT-3 do not perform well on specialized domains (e.g. medical text), and the common practice to achieve State of the Art (SoTA) results still consists of pre-training and fine-tuning the PLMs on downstream tasks. The reliance on fine-tuning large PLMs is problematic in clinical settings where data is often held in non-GPU environments, and more resource efficient methods of training specialized domain models is crucial. We investigated the viability of prompt learning on clinically meaningful decision tasks and directly compared with more traditional fine-tuning methods. Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning with substantially fewer trainable parameters and requiring less training data. We argue that prompt learning therefore provides lower computational resource costs applicable to clinical settings, that can serve as an alternative to fine-tuning ever increasing in size PLMs.

show abstract

“…Lester et al (2021) propose a further simplified approach called prompt tuning, which only tunes the additional tunable tokens prepended to the input text. P-tuning v2 (Liu et al, 2021b) adapted the idea of prompt tuning by adding prompts in different layers as pre-fix tokens rather than only the input embedding. Its performance can be comparable to full-model tuning across both scales and tasks.…”

Section: Related Workmentioning

confidence: 99%

Learning a Better Initialization for Soft Prompts via Meta-Learning

Huang¹,

Qian²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

Prompt tuning (PT) is an effective approach to adapting pre-trained language models to downstream tasks. Without a good initialization, prompt tuning doesn't perform well under fewshot settings. So pre-trained prompt tuning (PPT) (Gu et al., 2022a) is proposed to initialize prompts by leveraging pre-training data. We propose MetaPT (Meta-learned Prompt Tuning) to further improve PPT's initialization by considering latent structure within the pre-training data. Specifically, we introduce the structure by first clustering pre-training data into different auxiliary tasks with unsupervised methods. Then we use these tasks to pre-train prompts with a meta-learning algorithm. Such a process can make prompts learn a better initialization by discovering commonalities among these auxiliary tasks. We evaluate our method on seven downstream tasks. Our MetaPT achieves better and more stable performance than the state-of-the-art method.

show abstract

P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

Cited by 399 publications

References 0 publications

DeepStruct: Pretraining of Language Models for Structure Prediction

DeepStruct: Pretraining of Language Models for Structure Prediction

Clinical Prompt Learning with Frozen Language Models

Learning a Better Initialization for Soft Prompts via Meta-Learning

Contact Info

Product

Resources

About