2022
DOI: 10.48550/arxiv.2203.02155
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Training language models to follow instructions with human feedback

Abstract: Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
501
1
4

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 419 publications
(508 citation statements)
references
References 65 publications
2
501
1
4
Order By: Relevance
“…Looking forward, we expect our model performance to continue to increase with more parameters, data, and training steps [39,32]. Moreover, fine-tuning would allow our models to be better able to condition on natural language instructions and other indications of human intent [76,65,48]. Finally, our model lays a foundation for future work on supervised infilling & editing via model fine-tuning, as well as performing iterative decoding, where the model can be used to refine its own output [27].…”
Section: Discussionmentioning
confidence: 97%
“…Looking forward, we expect our model performance to continue to increase with more parameters, data, and training steps [39,32]. Moreover, fine-tuning would allow our models to be better able to condition on natural language instructions and other indications of human intent [76,65,48]. Finally, our model lays a foundation for future work on supervised infilling & editing via model fine-tuning, as well as performing iterative decoding, where the model can be used to refine its own output [27].…”
Section: Discussionmentioning
confidence: 97%
“…For this reason, preference learning, uncertainty modeling and value alignment (Russell, 2019) are especially important for the design of humancompatible generalist agents. It may be possible to extend some of the value alignment approaches for language (Kenton et al, 2021;Ouyang et al, 2022) to generalist agents. However, even as technical solutions are developed for value alignment, generalist systems could still have negative societal impacts even with the intervention of well-intentioned designers, due to unforeseen circumstances or limited oversight (Amodei et al, 2016).…”
Section: Broader Impactmentioning
confidence: 99%
“…Wei et al (2021) fine tuned Google's internal 137B parameter pretrained LM on their curated suite of 60 datasets, producing a multi-tasked model called FLAN. Min et al (2021) fine tuned the 770M parameter GPT2 (Radford et al, 2019) on a curated suite of 142 datasets, and Ouyang et al (2022) fine tuned the 175B parameter GPT3 (Brown et al, 2020) on disparate datasets of human instructions, using reinforcement learning from human feedback, producing a new multi-tasked InstructGPT model.…”
Section: Input-dependent Prompt Tuning For Multi-tasking a Frozen Lmmentioning
confidence: 99%
“…A side effect of doing so is that performance degrades significantly on other tasks. Partly in response, considerable recent work has been devoted to fine tuning huge LMs simultaneously on many (in some cases, over 100) curated NLP tasks (Sanh et al, 2021;Wei et al, 2021;Min et al, 2021;Aribandi et al, 2021;Ouyang et al, 2022). These formidable efforts have been effective in the sense that they have produced models that exhibit high performance on inputs taken from any of the curated tasks, and, indeed, from other similar tasks.…”
Section: Introductionmentioning
confidence: 99%