Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1060
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates

Abstract: Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 19 publications
0
12
0
Order By: Relevance
“…The outputs to the neural nets are multimodal actions containing words and arm movements. A natural extension to this work is training agents to play multiple games, but this will require more scalable methods combining ideas from [17], [14], [21], [9], [10], [6], [5], [20], [7]. The degrees of engagement using different sets of modalities also remains to be investigated.…”
Section: Discussionmentioning
confidence: 99%
“…The outputs to the neural nets are multimodal actions containing words and arm movements. A natural extension to this work is training agents to play multiple games, but this will require more scalable methods combining ideas from [17], [14], [21], [9], [10], [6], [5], [20], [7]. The degrees of engagement using different sets of modalities also remains to be investigated.…”
Section: Discussionmentioning
confidence: 99%
“…The system falters in out of vocabulary words and hence is difficult to be scalable in complex scenarios. In [26], authors proposed a fast DRL approach that uses a network of DQN agents that skips weight updates during exploitation of actions. In [6], authors proposed a variant of DQN where the VA explores via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural networks by maintaining a probability distribution over the weights in the network.…”
Section: Plos Onementioning
confidence: 99%
“…However, it is limited to small‐scale problems, since they require many hand‐crafted features for the state and action space representation (Young, Schatzmann, Weilhammer, & Ye, 2007). Other interesting approaches for statistical DM are based on modelling the system by means of Hidden Markov models (HMMs; Cuayáhuitl, Renals, Lemon, & Shimodaira, 2005), stochastic finite‐state transducers (Hurtado et al, 2010), Bayesian networks (Meng, Wai, & Pieraccini, 2003), or recurrent neural networks (Gao, Galley, & Li, 2019) and Reinforcement Deep Learning Cuayáhuitl, Keizer, and Lemon (2015).…”
Section: State Of the Art: User‐adapted Conversational Interfacesmentioning
confidence: 99%