2022
DOI: 10.48550/arxiv.2204.10019
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Standing on the Shoulders of Giant Frozen Language Models

Abstract: Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM-i.e., leaving its weights untouched-still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 19 publications
(45 reference statements)
0
2
0
Order By: Relevance
“…Recent studies in Transformerbased language models such as ELMo [15] and BERT [4] have shown their capabilities in scaling up model sizes with pre-training methodology such as Masked Language Modeling [4]. Shortly after, several Large Language Models (LLMs), e.g., the GPT family [1,13], PaLM [2], Jurassic-X [9], Megatron-Turing [16] , LaMDA [17],LLaMA [18], have been emerging with huge amount of parameters of up to 100B-5000B parameters. They have shown great advantages in language modeling tasks, such as arithmetic reasoning, commonsense reasoning, symbolic reasoning and natural language inference.…”
Section: Related Workmentioning
confidence: 99%
“…Recent studies in Transformerbased language models such as ELMo [15] and BERT [4] have shown their capabilities in scaling up model sizes with pre-training methodology such as Masked Language Modeling [4]. Shortly after, several Large Language Models (LLMs), e.g., the GPT family [1,13], PaLM [2], Jurassic-X [9], Megatron-Turing [16] , LaMDA [17],LLaMA [18], have been emerging with huge amount of parameters of up to 100B-5000B parameters. They have shown great advantages in language modeling tasks, such as arithmetic reasoning, commonsense reasoning, symbolic reasoning and natural language inference.…”
Section: Related Workmentioning
confidence: 99%
“…Workflows are often more expensive than simpler approaches (e.g., Chain of Thought prompting) [35,38,45,53,62,90,104,105,110,134,150,157]. However, the cost is often less than hiring professionals [6,30,43,89,151], pretraining models [31,57,69,86,150], or completing the task yourself [135,156].…”
Section: Outcomementioning
confidence: 99%