Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.123
|View full text |Cite
|
Sign up to set email alerts
|

Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement Learning

Abstract: Likelihood training and maximization-based decoding result in dull and repetitive generated texts even when using powerful language models (Holtzman et al., 2019). Adding a loss function for regularization was shown to improve text generation output by helping avoid unwanted properties, such as contradiction or repetition (Li et al., 2020). In this work, we propose fine-tuning a language model by using policy gradient reinforcement learning, directly optimizing for better generation. We apply this approach to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 6 publications
0
7
0
Order By: Relevance
“…Test-time metrics are not the only quantities that can be maximized through RL. For example, Lagutin et al (2021) suggest considering the count of 4-gram repetitions in the generated text, to reduce the likelihood of undesirable results. The combination of these techniques and classic self-supervised learning helps learn both how to write and how not to write.…”
Section: Overviewmentioning
confidence: 99%
“…Test-time metrics are not the only quantities that can be maximized through RL. For example, Lagutin et al (2021) suggest considering the count of 4-gram repetitions in the generated text, to reduce the likelihood of undesirable results. The combination of these techniques and classic self-supervised learning helps learn both how to write and how not to write.…”
Section: Overviewmentioning
confidence: 99%
“…Samples that reflect some particular undesirable behavior are called negative data, which has been studied to help model correct such behavior (He and Glass 2020;Welleck et al 2020;Lagutin, Gavrilov, and Kalaidin 2021). He and Glass (2020) conducts negative updates with training signals provided by negative samples to avoid model generating such data.…”
Section: Learning From Negative Viewsmentioning
confidence: 99%
“…In this work, following Devaraj et al (2021) we use unlikelihood (UL) training (Welleck et al, 2020) to encourage the generation of simplified terminology. This strategy has been used in other domains to penalize inaccuracy (Hu et al, 2023;Nan et al, 2022), complexity (Devaraj et al, 2021;Lu et al, 2023), and redundancy (Lagutin et al, 2021;) in text generation. Unlike Devaraj et al (2021), our work adapts UL to optimize for both readability and factual consistency.…”
Section: Related Workmentioning
confidence: 99%