GPT Understands, Too

Liu, Xiao; Zheng, Yuan; Du, Zhengxiao; Ding, Ming; Qian, Yingyi; Yang, Zhou; Tang, Jie

doi:10.48550/arxiv.2103.10385

Cited by 110 publications

(161 citation statements)

References 0 publications

Supporting

Mentioning

158

Contrasting

Unclassified

Order By: Relevance

“…∈ R l×d/N h denote the i-th head vector. Prompt-tuning (Lester et al, 2021) simplifies prefix-tuning by only prepending to the input word embeddings in the first layer; similar work also includes P-tuning (Liu et al, 2021b).…”

Section: Overview Of Previous Parameter-efficient Tuning Methodsmentioning

confidence: 99%

Towards a Unified View of Parameter-Efficient Transfer Learning

He¹,

Zhou²,

Ma³

et al. 2021

Preprint

View full text Add to dashboard Cite

Fine-tuning large pretrained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches finetune all the parameters of the pretrained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pretrained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks. * Equal Contribution. Order determined by random dice rolling.

show abstract

Section: Overview Of Previous Parameter-efficient Tuning Methodsmentioning

confidence: 99%

Towards a Unified View of Parameter-Efficient Transfer Learning

He¹,

Zhou²,

Ma³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…We suspect that effective music audio generation necessitates intermediate representations that would also contain useful information for MIR. This hypothesis is further motivated by an abundance of previous work in NLP suggesting that generative and selfsupervised pre-training can yield powerful representations for discriminative tasks [22][23][24][25].…”

Section: Calm Pre-trainingmentioning

confidence: 99%

Codified audio language modeling learns useful representations for music information retrieval

Castellon,

Donahue,

Liang

2021

Preprint

View full text Add to dashboard Cite

We demonstrate that language models pre-trained on codified (discretely-encoded) music audio learn representations that are useful for downstream MIR tasks. Specifically, we explore representations from Jukebox [1]: a music generation system containing a language model trained on codified audio from 1M songs. To determine if Jukebox's representations contain useful information for MIR, we use them as input features to train shallow models on several MIR tasks. Relative to representations from conventional MIR models which are pre-trained on tagging, we find that using representations from Jukebox as input features yields 30% stronger performance on average across four MIR tasks: tagging, genre classification, key detection, and emotion recognition. For key detection, we observe that representations from Jukebox are considerably stronger than those from models pre-trained on tagging, suggesting that pre-training via codified audio language modeling may address blind spots in conventional approaches. We interpret the strength of Jukebox's representations as evidence that modeling audio instead of tags provides richer representations for MIR.* Equal contribution 1 MIR has a broad definition, but in this paper "MIR" refers specifically to making discriminative predictions on music audio.

show abstract

“…Methods [25,61] have been proposed to automate the prompt engineering process. The prompting process does not tune any of the parameters, which is empirically sub-optimal compared to fine-tuning [43].…”

Section: Related Workmentioning

confidence: 99%

“…For few-shot scenario, [20] proves that prompt tuning can be much better than traditional fine-tuning. When the training data is sufficient, prompt tuning performs slightly worse than fine-tuning [43]. However, the performance gap from full-model fine-tuning closes up as the pre-trained model gets larger [33,42].…”

Section: Related Workmentioning

confidence: 99%

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Zhu¹,

Zhu²,

et al. 2021

Preprint

View full text Add to dashboard Cite

Biological intelligence systems of animals perceive the world by integrating information in different modalities and processing simultaneously for various tasks. In contrast, current machine learning research follows a task-specific paradigm, leading to inefficient collaboration between tasks and high marginal costs of developing perception models for new tasks. In this paper, we present a generic perception architecture named Uni-Perceiver, which processes a variety of modalities and tasks with unified modeling and shared parameters. Specifically, Uni-Perceiver encodes different task inputs and targets from arbitrary modalities into a unified representation space with a modality-agnostic Transformer encoder and lightweight modality-specific tokenizers. Different perception tasks are modeled as the same formulation, that is, finding the maximum likelihood target for each input through the similarity of their representations. The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks. The performance can be improved to a level close to state-of-the-art methods by con-

show abstract

GPT Understands, Too

Cited by 110 publications

References 0 publications

Towards a Unified View of Parameter-Efficient Transfer Learning

Towards a Unified View of Parameter-Efficient Transfer Learning

Codified audio language modeling learns useful representations for music information retrieval

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

Contact Info

Product

Resources

About