2020
DOI: 10.1609/aaai.v34i05.6249
|View full text |Cite
|
Sign up to set email alerts
|

Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning

Abstract: Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 31 publications
0
8
0
Order By: Relevance
“…Muzellec and Cuturi [52] and others [15,23,73] embed words or sentences into distributions instead of vectors and compute the distance between embeddings using OT. Chen et al [7] and others [44,8] regularize the text generation models based on the OT distance between the generated texts and the ground truth texts to improve the generation. Nested Wasserstein [88] compares the distributions of sequences and is successfully used in imitation learning for text generation.…”
Section: Related Workmentioning
confidence: 99%
“…Muzellec and Cuturi [52] and others [15,23,73] embed words or sentences into distributions instead of vectors and compute the distance between embeddings using OT. Chen et al [7] and others [44,8] regularize the text generation models based on the OT distance between the generated texts and the ground truth texts to improve the generation. Nested Wasserstein [88] compares the distributions of sequences and is successfully used in imitation learning for text generation.…”
Section: Related Workmentioning
confidence: 99%
“…Among the formulas, the advantage function yields almost the lowest possible variance, though in practice, the advantage function is not known and must be estimated (Schulman et al 2016). Therefore, we use the advantage function instead of other formulas, which is different from (Rennie et al 2017;Ranzato et al 2016;Wu et al 2018;Chen et al 2020b), and the gradient ∇ θ L RL (θ) can be calculated with the advantage function:…”
Section: Policy Gradient Trainingmentioning
confidence: 99%
“…• OTRL. Optimal-Transport-Enhanced RL (OTRL) (Chen et al 2020a) combines RL and Optimal-Transport learning (Chen et al 2019) and obtains the state-of-the-art performance on MSCOCO dataset. They also use Top-down as the baseline model.…”
Section: Image Captioningmentioning
confidence: 99%
See 1 more Smart Citation
“…The optimal transport (OT) focuses on solving the optimal transport scheme between two different distributions, and its distance optimization problem is highly similar to the offloading problem that we need to solve, which is transferring tasks from EDs to computation nodes providing computing services [14]. We formulate joint optimization as an MDP problem and develop an Optimal-Transport-Based RL approach to solve it [15]. It is based on a Policy-Based RL to obtain a weighted sum that minimizes the processing delay of all tasks and end devices' energy consumption.…”
Section: Introductionmentioning
confidence: 99%