Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Cohen, Deborah A.; Ryu, Moonkyung; Chow, Yinlam; Keller, Orgad; Greenberg, Ido; Hassidim, Avinatan; Fink, Michael; Matias, Yossi; Szpektor, Idan; Boutilier, Craig; Elidan, Gal

doi:10.48550/arxiv.2208.02294

Cited by 2 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Openness is perhaps especially important for today's breed of instruction-following text generators, of which ChatGPT is the best known example. The persuasiveness of these language models is due in large part to an additional reinforcement learning component in which text generator output is pruned according to a reward function that is based on human feedback [12,43,59], using insights from early work on evaluative reinforcement [24,26,55]. Human users appear to be highly susceptible to the combination of interactivity and fluid text generation offered by this technology.…”

Section: Why Openness Mattersmentioning

confidence: 99%

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Liesenfeld¹,

Lopez²,

Dingemanse³

2023

Proceedings of the 5th International Conference on Conversational User Interfaces

View full text Add to dashboard Cite

show abstract

Section: Why Openness Mattersmentioning

confidence: 99%

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Liesenfeld¹,

Lopez²,

Dingemanse³

2023

Proceedings of the 5th International Conference on Conversational User Interfaces

View full text Add to dashboard Cite

show abstract

“…On the other hand, Offline RL (Fujimoto et al, 2019;Kumar et al, 2020;Brandfonbrener et al, 2021;Kostrikov et al, 2021) removes all need for environment interaction or user simulators, instead operating purely on static datasets of prior human interaction. There are many closely related works (Jaques et al, 2019(Jaques et al, , 2020Snell et al, 2022;Cohen et al, 2022;Verma et al, 2022;Jang et al, 2022) based on offline RL that policy improvement via behavior cloning of self-generated utterances, which inherits the ability of pre-trained language models to generate human-like responses. In RL parlance, such methods could be considered policy extraction with approximate dynamic programming.…”

Section: Related Workmentioning

confidence: 99%

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Cho¹,

Takahashi²,

Yanase³

et al. 2023

Preprint

View full text Add to dashboard Cite

Conventionally, since the natural language action space is astronomical, approximate dynamic programming applied to dialogue generation involves policy improvement with action sampling. However, such a practice is inefficient for reinforcement learning (RL) because the eligible (high action value) responses are very sparse, and the greedy policy sustained by the random sampling is flabby. This paper shows that the performance of dialogue policy positively correlated with sampling size by theoretical and experimental. We introduce a novel dual-granularity Q-function to alleviate this limitation by exploring the most promising response category to intervene in the sampling. It extracts the actions following the grained hierarchy, which can achieve the optimum with fewer policy iterations. Our approach learns in the way of offline RL from multiple reward functions designed to recognize human emotional details. Empirical studies demonstrate that our algorithm outperforms the baseline methods. Further verification presents that ours can generate responses with higher expected rewards and controllability.

show abstract

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Cited by 2 publications

References 32 publications

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Contact Info

Product

Resources

About