Scaling up deep reinforcement learning for multi-domain dialogue systems

Cuayáhuitl, Heriberto; Yu, Seunghak; Williamson, Ashley; Carse, Jacob

doi:10.1109/ijcnn.2017.7966275

Cited by 40 publications

(44 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…word present or absent), the words derived from user responses can be seen as continuous variables by taking ASR confidence scores into account. Our state representations used delexicalised word-based representations and excluded words from information presentation-for increased scalability, as described in [20].…”

Section: Multi-domain Dialogue Systemmentioning

confidence: 99%

“…Firstly, actions are selected from the most likely actions, P r(a|s) > 0.0001, derived from Naive Bayes classifiers (due to scalability purposes) trained from demonstration dialogues. See example demonstration dialogue in Appendix of [20]. Secondly, the most likely actions in he previous stage are extended with legitimate requests, apologies and confirmations.…”

Section: Multi-domain Dialogue Systemmentioning

confidence: 99%

“…Previous work tackling faster learning of DRL-based agents have used distributed neural nets [16,17], prioritised experience replay by sampling from important previous experiences [18], fast reward propagation [19], and distributed policies to train specialised agents (one per task or domain) [20,21,22]. We propose a method that can be used on top of previous methods, which applies a reduced amount of weight updates in two ways.…”

Section: Introductionmentioning

confidence: 99%

“…This paper treats neural-based dialogue agents in multiple domains as a network of Deep Reinforcement Learners as proposed in [20], for example by using a network of Deep Qnetworks (DQN). A DQN agent aims to find an optimal policy by maximising its cumulative discounted reward defined as…”

Section: Introductionmentioning

confidence: 99%

“…While user responses can motivate transitions to another domain in the network, completing a subdialogue within a domain motivates a transition to the previous domain to resume the interaction. [20] and [22] describe algorithms to train and execute NDQN agents.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Deep Reinforcement Learning of Dialogue Policies with Less Weight Updates

Cuayáhuitl¹,

Yu²

2017

Interspeech 2017

Self Cite

View full text Add to dashboard Cite

Deep reinforcement learning dialogue systems are attractive because they can jointly learn their feature representations and policies without manual feature engineering. But its application is challenging due to slow learning. We propose a two-stage method for accelerating the induction of single or multi-domain dialogue policies. While the first stage reduces the amount of weight updates over time, the second stage uses very limited minibatches (of as much as two learning experiences) sampled from experience replay memories. The former frequently updates the weights of the neural nets at early stages of training, and decreases the amount of updates as training progresses by performing updates during exploration and by skipping updates during exploitation. The learning process is thus accelerated through less weight updates in both stages. An empirical evaluation in three domains (restaurants, hotels and tv guide) confirms that the proposed method trains policies 5 times faster than a baseline without the proposed method. Our findings are useful for training larger-scale neural-based spoken dialogue systems.

show abstract

Section: Multi-domain Dialogue Systemmentioning

confidence: 99%

Section: Multi-domain Dialogue Systemmentioning

confidence: 99%