2019
DOI: 10.48550/arxiv.1902.08858
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(29 citation statements)
references
References 30 publications
0
29
0
Order By: Relevance
“…Though these methods have achieved promising results, they were usually designed for a specific domain, rendering difficulties in generalizing to multi-domains, e.g., the recently proposed multi-domain dataset MultiWoz (Eric et al 2019). Subsequently, there are several models are proposed to handle the multi-domain response generation task (Zhao, Xie, and Eskenazi 2019;Chen et al 2019;Qin et al 2020). To prevent dialog acts growing combinatorially with the number of domains, Chen et al ( 2019) built a multi-layer hierarchical graph to represent dialog acts to generate responses using BERT-based dialog policy.…”
Section: Related Workmentioning
confidence: 99%
“…Though these methods have achieved promising results, they were usually designed for a specific domain, rendering difficulties in generalizing to multi-domains, e.g., the recently proposed multi-domain dataset MultiWoz (Eric et al 2019). Subsequently, there are several models are proposed to handle the multi-domain response generation task (Zhao, Xie, and Eskenazi 2019;Chen et al 2019;Qin et al 2020). To prevent dialog acts growing combinatorially with the number of domains, Chen et al ( 2019) built a multi-layer hierarchical graph to represent dialog acts to generate responses using BERT-based dialog policy.…”
Section: Related Workmentioning
confidence: 99%
“…One drawback of end-to-end neural systems is that training the word-level policy with RL is very difficult due to the large action space. To mitigate this issue, Zhao et al (2019) proposed to learn policy networks in the latent action space. proposed a universal and scalable belief tracker by jointly learning the NLU and DST modules, improving the flexibility of domain ontology configurations.…”
Section: End-to-end Neural Dialog Systemmentioning
confidence: 99%
“…Succ. TSCP (Lei et al, 2018) 18.20 32.0 13.68 11.8 SUMBT 13.71 44.0 46.44 27.8 LaRL (Zhao et al, 2019) 13.08 68.0 68.95 47.7 DAMD (Zhang et al, 2020a) 11.27 64.0 59.7 48.5 M-GDPL (Takanobu et al, 2019) Table 1: System-wise automatic evaluation performance of dialog turns, inform recall, match rate, and success rate, for all baseline systems and our S-PPO system. S-PPO beats all baselines and achieves state-of-the-art results.…”
Section: Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…And whether a model can generate diverse (Xu et al, 2018;Baheti et al, 2018), coherent (Li et al, 2016bTian et al, 2017;Bosselut et al, 2018;Adiwardana et al, 2020), informative (Shao et al, 2017;Lewis et al, 2017;Ghazvininejad et al, 2017;Young et al, 2017;Zhao et al, 2019) and knowledge-fused (Hua et al, 2020;Zhao et al, 2020;He et al, 2020) responses or not has become metrics to evaluate a dialog generation model. However, the mainly researches described above are developed on textual only and the development of multimodal dialog generation is relatively slow since the lack of large-scale datasets.…”
Section: Dialog Generationmentioning
confidence: 99%