Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2023
DOI: 10.18653/v1/2023.acl-long.427
|View full text |Cite
|
Sign up to set email alerts
|

RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs

Abstract: Despite their unprecedented success, even the largest language models make mistakes. Similar to how humans learn and improve using feedback, previous work proposed providing language models with natural language feedback to guide them in repairing their outputs. Because human-generated critiques are expensive to obtain, researchers have devised learned critique generators in lieu of human critics while assuming one can train downstream models to utilize generated feedback. However, this approach does not apply… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…CRITIC (Gou et al, 2023) proposes the use of a suite of specialized tools for a variety of tasks such as code interpreters, calculators, or search engines to generate critics for the LLM's generated output. Moreover, approaches such as REFINER (Paul et al, 2023), CodeRL and RL4F (Akyurek et al, 2023) propose to train a specialized critic to provide feedback to the generator model.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…CRITIC (Gou et al, 2023) proposes the use of a suite of specialized tools for a variety of tasks such as code interpreters, calculators, or search engines to generate critics for the LLM's generated output. Moreover, approaches such as REFINER (Paul et al, 2023), CodeRL and RL4F (Akyurek et al, 2023) propose to train a specialized critic to provide feedback to the generator model.…”
Section: Related Workmentioning
confidence: 99%
“…Various methods have been proposed to tackle this problem (Pan et al, 2023). From training-time correction Li et al, 2019;Jauregi Unanue et al, 2021;Zelikman et al, 2022;Huang et al, 2022) to post output generation refinement (Madaan et al, 2023;Shinn et al, 2023;Zhang et al, 2023;Pan et al, 2023;Yu et al, 2023;Gou et al, 2023;Paul et al, 2023;Akyurek et al, 2023), these methods have shown the impact that iterative self-refinement and proper feedback can have on the performance of LLMs.…”
Section: Introductionmentioning
confidence: 99%