Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.59
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

Abstract: Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator requires heavy domain expertise for complex tasks, and a data-driven simulator requires considerable data and it is even unclear how to evaluate a simulator. To avoid explicitly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(21 citation statements)
references
References 30 publications
0
21
0
Order By: Relevance
“…The literature on joint optimization of the DS and the US is line of research most relevant to our work. Takanobu et al (2020) proposed a hybrid value network using MARL (Lowe et al, 2017) with roleaware reward decomposition used in optimising the dialogue manager. However, their model requires separate NLU/NLG models to interact via natural language, which hinders its application in the transfer learning to new domains.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The literature on joint optimization of the DS and the US is line of research most relevant to our work. Takanobu et al (2020) proposed a hybrid value network using MARL (Lowe et al, 2017) with roleaware reward decomposition used in optimising the dialogue manager. However, their model requires separate NLU/NLG models to interact via natural language, which hinders its application in the transfer learning to new domains.…”
Section: Related Workmentioning
confidence: 99%
“…Prior work has shown benefits from this approach to dialogue policy learning, with a higher success rate at dialogue level (Liu and Lane, 2017b;Papangelis et al, 2019;Takanobu et al, 2020), but there has not been previous work that addresses multi-domain end-to-end dialogue modelling for both agents. Takanobu et al (2020) address refinement of the dialogue policy alone at the semantic level, but do not address end-to-end system architectures. Liu and Lane (2017b); Papangelis et al (2019) address single-domain dialogues (Henderson et al, 2014), but not the more realistic and complex multi-domain dialogues.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The dialogue literature widely applies reinforcement learning, including the recent ones based on deep architectures (Takanobu et al, 2019(Takanobu et al, , 2020Takanobu et al, 2020;Gordon-Hall et al, 2020a,b). But these taskoriented RL dialogue systems often model the dialogue with limited parameters and assumptions specific to the dataset, targeted for that task.…”
Section: Related Workmentioning
confidence: 99%
“…Over the last few years, two promising research directions in ToDs have emerged. The first focuses on a pipeline approach, which consists of modularly connected components (Wu et al, 2019a;Takanobu et al, 2020;Peng et al, 2020;. The second direction employs an end- to-end model, which directly takes the sequence-tosequence (Seq2Seq) model to generate a response from a dialogue history and a corresponding knowledge base (KB) (Eric et al, 2017;Madotto et al, 2018;Wen et al, 2018;Qin et al, 2019b;Wu et al, 2019b;Qin et al, 2020b) In recent years, with the burst of deep neural networks and the evolution of pre-trained language models, the research of ToDs has obtained great success.…”
Section: Introductionmentioning
confidence: 99%