Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1253
|View full text |Cite
|
Sign up to set email alerts
|

Subgoal Discovery for Hierarchical Dialogue Policy Learning

Abstract: Developing agents to engage in complex goaloriented dialogues is challenging partly because the main learning signals are very sparse in long conversations. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given successful example dialogues, we propose the Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use thes… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
22
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 34 publications
(22 citation statements)
references
References 36 publications
0
22
0
Order By: Relevance
“…[23] proposed a HRL approach based on options framework to learn policies in different domains. In [24], authors propose a divide and conquer approach for efficient policy learning where a complex goal-oriented task is broken into simpler subgoals in an unsupervised manner and then these subgoals are used to learn a multi-level policy using HRL. Feudal Reinforcement Learning has been used with DQN in the work of [25] for learning policies in large domains.…”
Section: Plos Onementioning
confidence: 99%
“…[23] proposed a HRL approach based on options framework to learn policies in different domains. In [24], authors propose a divide and conquer approach for efficient policy learning where a complex goal-oriented task is broken into simpler subgoals in an unsupervised manner and then these subgoals are used to learn a multi-level policy using HRL. Feudal Reinforcement Learning has been used with DQN in the work of [25] for learning policies in large domains.…”
Section: Plos Onementioning
confidence: 99%
“…For example, Kulkarni et al [42] presented a hierarchical-DQN framework that integrates the temporal abstraction and an intrinsic motivation mechanism to be able to play the classic Atari game ‘Montezuma’s Revenge’ which is a big challenge for standard RL approaches. Tang et al [43] used HRL to learn the dialogue policy for task-completion dialogue agents, where subgoals in the complex goal-directed task are automatically learned in an unsupervised fashion. Peng et al [44] showed that the use of subgoals mitigates the reward sparsity and leads to more effective exploration for learning.…”
Section: Related Workmentioning
confidence: 99%
“…Da Tang proposed Sub-goal Discovery Network [14] to divide the complex goal-oriented tasks into sub-goals in an unsupervised way. Moreover, they present a dialogue agent for the composite task of travel planning.…”
Section: Related Workmentioning
confidence: 99%