Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.163
|View full text |Cite
|
Sign up to set email alerts
|

High-Quality Dialogue Diversification by Intermittent Short Extension Ensembles

Abstract: Many task-oriented dialogue systems use deep reinforcement learning (DRL) to learn policies that respond to the user appropriately and complete the tasks successfully. Training DRL agents with diverse dialogue trajectories prepare them well for rare user requests and unseen situations. One effective diversification method is to let the agent interact with a diverse set of learned user models. However, trajectories created by these artificial user models may contain generation errors, which can quickly propagat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…In model-based reinforcement learning, where diversity is produced by altering the learning environment, developed Probabilistic Ensemble Trajectory Sampling (PETS), which learns and employs an ensemble of environmental models for planning. ISEE obtains hybrid training trajectories by forking from the original trajectories generated by the expert simulator and extending the new trajectories by the derived simulator, thereby obtaining smoother transition distributions and simplifying the discrete action space [7].…”
Section: Diversification In Dialoguesmentioning
confidence: 99%
“…In model-based reinforcement learning, where diversity is produced by altering the learning environment, developed Probabilistic Ensemble Trajectory Sampling (PETS), which learns and employs an ensemble of environmental models for planning. ISEE obtains hybrid training trajectories by forking from the original trajectories generated by the expert simulator and extending the new trajectories by the derived simulator, thereby obtaining smoother transition distributions and simplifying the discrete action space [7].…”
Section: Diversification In Dialoguesmentioning
confidence: 99%
“…The output sequence can be semantic actions or natural language utterances [7,10,15,17,18,37,38]. Tang et al [35] train USs by supervised learning with different initialisation to create different user policies. Lin et al [17] proposed GenTUS, an ontology-independent US which generates natural language utterances as well as the underlying semantic actions for interpretability.…”
Section: Related Workmentioning
confidence: 99%
“…Adjusting the probability distribution of user actions in rule-based USs is a popular method to address diversity [13], but real users differ in more ways than just action preferences. Training USs by supervised learning with different initialisation [35] or by RL with varying reward functions can also form various user policies [17], but that can only provide diverse extrinsic behaviour, e.g. the action length in each turn or the semantic content.…”
Section: Introductionmentioning
confidence: 99%
“…The surge in generative models has catalyzed substantial progress within recommender systems. For example, pre-trained generative models have demonstrated their capability to effectively learn user preferences from historical interactions [1,2]; generative models might help to produce item content to meet users' diverse information needs in some scenarios [17], generative models have shown promise in generating item content that caters to the diverse users' information needs in specific contexts; and the emergence of ChatGPT-like language models offers novel interaction modes to obtain users' feedback and intension [8,14].…”
Section: Call For Papers 41 Introductionmentioning
confidence: 99%