2022
DOI: 10.48550/arxiv.2208.02294
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Abstract: Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (superv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…Openness is perhaps especially important for today's breed of instruction-following text generators, of which ChatGPT is the best known example. The persuasiveness of these language models is due in large part to an additional reinforcement learning component in which text generator output is pruned according to a reward function that is based on human feedback [12,43,59], using insights from early work on evaluative reinforcement [24,26,55]. Human users appear to be highly susceptible to the combination of interactivity and fluid text generation offered by this technology.…”
Section: Why Openness Mattersmentioning
confidence: 99%
“…Openness is perhaps especially important for today's breed of instruction-following text generators, of which ChatGPT is the best known example. The persuasiveness of these language models is due in large part to an additional reinforcement learning component in which text generator output is pruned according to a reward function that is based on human feedback [12,43,59], using insights from early work on evaluative reinforcement [24,26,55]. Human users appear to be highly susceptible to the combination of interactivity and fluid text generation offered by this technology.…”
Section: Why Openness Mattersmentioning
confidence: 99%
“…On the other hand, Offline RL (Fujimoto et al, 2019;Kumar et al, 2020;Brandfonbrener et al, 2021;Kostrikov et al, 2021) removes all need for environment interaction or user simulators, instead operating purely on static datasets of prior human interaction. There are many closely related works (Jaques et al, 2019(Jaques et al, , 2020Snell et al, 2022;Cohen et al, 2022;Verma et al, 2022;Jang et al, 2022) based on offline RL that policy improvement via behavior cloning of self-generated utterances, which inherits the ability of pre-trained language models to generate human-like responses. In RL parlance, such methods could be considered policy extraction with approximate dynamic programming.…”
Section: Related Workmentioning
confidence: 99%