Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning

Chen, Liqun; Bai, Ke; Tao, Chenyang; Zhang, Yizhe; Wang, Guoyin; Wang, Wenlin; Henao, Ricardo; Carin, Lawrence

doi:10.1609/aaai.v34i05.6249

Cited by 11 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Muzellec and Cuturi [52] and others [15,23,73] embed words or sentences into distributions instead of vectors and compute the distance between embeddings using OT. Chen et al [7] and others [44,8] regularize the text generation models based on the OT distance between the generated texts and the ground truth texts to improve the generation. Nested Wasserstein [88] compares the distributions of sequences and is successfully used in imitation learning for text generation.…”

Section: Related Workmentioning

confidence: 99%

Re-evaluating Word Mover's Distance

Sato¹,

Yamada²,

Kashima

2021

Preprint

View full text Add to dashboard Cite

The word mover's distance (WMD) is a fundamental technique for measuring the similarity of two documents. As the crux of WMD, it can take advantage of the underlying geometry of the word space by employing an optimal transport formulation. The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets. In this paper, we point out that the evaluation in the original study could be misleading. We re-evaluate the performances of WMD and the classical baselines and find that the classical baselines are competitive with WMD if we employ an appropriate preprocessing, i.e., L1 normalization. However, this result is not intuitive. WMD should be superior to BOW because WMD can take the underlying geometry into account, whereas BOW cannot. Our analysis shows that this is due to the high-dimensional nature of the underlying metric. We find that WMD in high-dimensional spaces behaves more similarly to BOW than in low-dimensional spaces due to the curse of dimensionality.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Re-evaluating Word Mover's Distance

Sato¹,

Yamada²,

Kashima

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Among the formulas, the advantage function yields almost the lowest possible variance, though in practice, the advantage function is not known and must be estimated (Schulman et al 2016). Therefore, we use the advantage function instead of other formulas, which is different from (Rennie et al 2017;Ranzato et al 2016;Wu et al 2018;Chen et al 2020b), and the gradient ∇ θ L RL (θ) can be calculated with the advantage function:…”

Section: Policy Gradient Trainingmentioning

confidence: 99%

“…• OTRL. Optimal-Transport-Enhanced RL (OTRL) (Chen et al 2020a) combines RL and Optimal-Transport learning (Chen et al 2019) and obtains the state-of-the-art performance on MSCOCO dataset. They also use Top-down as the baseline model.…”

Section: Image Captioningmentioning

confidence: 99%

“…To address the loss inconsistency and exposure bias issues, reinforcement learning (RL) methods have been adopted to train NLG models to avoid these two issues. For example, policy gradient and actor-critic methods (Bahdanau et al 2017;Rennie et al 2017;Ranzato et al 2016;Chen et al 2020a;Wu et al 2018) are applied to this task. Unlike MLE, which maximizes the loglikelihood, RL-based methods optimize the reward function.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation

Cheng

Luo

Yin

2021

AAAI

View full text Add to dashboard Cite

Natural language generation (NLG) is an important task with various applications like neural machine translation (NMT) and image captioning. Since deep-learning-based methods have issues of exposure bias and loss inconsistency, reinforcement learning (RL) is widely adopted in NLG tasks recently. But most RL-based methods ignore the deviation ignorance issue, which means the model fails to understand the extent of token-level deviation well. It leads to semantic incorrectness and hampers the agent to perform well. To address the issue, we propose a technique called adaptive prior-dependent correction (APDC) to enhance RL. It leverages the distribution generated by computing the distances between the ground truth and all other words to correct the agent's stochastic policy. Additionally, some techniques on RL are explored to coordinate RL with APDC, which requires a reward estimation at every time step. We find that the RL-based NLG tasks are a special case in RL, where the state transition is deterministic and the afterstate value equals the Q-value at every time step. To utilize such prior knowledge, we estimate the advantage function with the difference of the Q-values which can be estimated by Monte Carlo rollouts. Experiments show that, on three tasks of NLG (NMT, image captioning, abstractive text summarization), our method consistently outperforms the state-of-the-art RL-based approaches on different frequently-used metrics.

show abstract

“…The optimal transport (OT) focuses on solving the optimal transport scheme between two different distributions, and its distance optimization problem is highly similar to the offloading problem that we need to solve, which is transferring tasks from EDs to computation nodes providing computing services [14]. We formulate joint optimization as an MDP problem and develop an Optimal-Transport-Based RL approach to solve it [15]. It is based on a Policy-Based RL to obtain a weighted sum that minimizes the processing delay of all tasks and end devices' energy consumption.…”

Section: Introductionmentioning

confidence: 99%

An Optimal-Transport-Based Reinforcement Learning Approach for Computation Offloading

Zhou

et al. 2021

Preprint

View full text Add to dashboard Cite

With the mass deployment of computing-intensive applications and delay-sensitive applications on end devices, only adequate computing resources can meet differentiated services' delay requirements. By offloading tasks to cloud servers or edge servers, computation offloading can alleviate computing and storage limitations and reduce delay and energy consumption. However, few of the existing offloading schemes take into consideration the cloud-edge collaboration and the constraint of energy consumption and task dependency. This paper builds a collaborative computation offloading model in cloud and edge computing and formulates a multi-objective optimization problem. Constructed by fusing optimal transport and Policy-Based RL, we propose an Optimal-Transport-Based RL approach to resolve the offloading problem and make the optimal offloading decision for minimizing the overall cost of delay and energy consumption. Simulation results show that the proposed approach can effectively reduce the cost and significantly outperforms existing optimization solutions.

show abstract

Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning

Cited by 11 publications

References 31 publications

Re-evaluating Word Mover's Distance

Re-evaluating Word Mover's Distance

Adaptive Prior-Dependent Correction Enhanced Reinforcement Learning for Natural Language Generation

An Optimal-Transport-Based Reinforcement Learning Approach for Computation Offloading

Contact Info

Product

Resources

About