Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Maize, the highest-yielding cereal crop worldwide, is particularly susceptible to drought during its 2- to 3-week flowering period. Many genetic engineering strategies for drought tolerance impinge on plant development, reduce maximum yield potential or do not translate from laboratory conditions to the field. We overexpressed a gene encoding a rice trehalose-6-phosphate phosphatase (TPP) in developing maize ears using a floral promoter. This reduced the concentration of trehalose-6-phosphate (T6P), a sugar signal that regulates growth and development, and increased the concentration of sucrose in ear spikelets. Overexpression of TPP increased both kernel set and harvest index. Field data at several sites and over multiple seasons showed that the engineered trait improved yields from 9% to 49% under non-drought or mild drought conditions, and from 31% to 123% under more severe drought conditions, relative to yields from nontransgenic controls.
As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about-summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts [57] and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles [21], producing summaries nearly as good as the human reference without any news-specific fine-tuning. 2 We conduct extensive analyses to understand our human feedback dataset and fine-tuned models. 3 We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want. * This was a joint project of the OpenAI Reflection team. Author order was randomized amongst {LO, JW, DZ, NS}; CV and RL were full-time contributors for most of the duration. PC is the team lead.2 Samples from all of our models can be viewed on our website. 3 We provide inference code for our 1.3B models and baselines, as well as a model card, here. We are not able to release our dataset of human preferences at this time.Preprint. Under review.
We fine-tune GPT-3 to answer long-form questions using a text-based webbrowsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.
In ASL, two-handed signs fall into three major sets. In one set the hands have different shapes and either only the dominant hand moves or the hands move as a unit. Battison's Dominance Condition was intended to account for the fact that the nondominant hand typically assumes an unmarked shape when it is stationary. However, we show that the non-dominant hand does this even when the hands move as a unit. In the second set the hands have the same shape and only the dominant hand moves. These signs are unrestricted for handshape. In the third set the hands have the same shape and both move. Battison's Symmetry Condition was intended to account for restrictions on the parameters of these signs. We argue that four basic types of symmetry transformations occur, with various complications: reflection, rotation, translation, and glide reflection, all of which call for conditions specific to them, and lead to an overriding condition on movement in symmetry transformation signs. The conditions uncovered here might be morpheme structure constraints or, instead, simply follow from physiological limitations of hands in motion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.