Hybridisation of expertise and reinforcement learning in dialogue systems

Laroche, Romain; Putois, Ghislain; Bretier, Philippe; Bouchon‐Meunier, Bernadette

doi:10.21437/interspeech.2009-660

Cited by 7 publications

(3 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first action consists of informing the user about the price of the regal resort and the second action consists of proposing another option, Hotel Globetrotter. Performing more than one action per turn is a challenge when using reinforcement learning (Fatemi et al, 2016;Gašić et al, 2012;Pietquin et al, 2011) and, to our knowledge, this has only been done in a simulated setting (Laroche et al, 2009).…”

Section: Dialogue Managementmentioning

confidence: 99%

Frames: a corpus for adding memory to goal-oriented dialogue systems

Asri¹,

Schulz²,

Sharma³

et al. 2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

176

167

View full text Add to dashboard Cite

This paper proposes a new dataset, Frames, composed of 1369 human-human dialogues with an average of 15 turns per dialogue. This corpus contains goal-oriented dialogues between users who are given some constraints to book a trip and assistants who search a database to find appropriate trips. The users exhibit complex decision-making behaviour which involve comparing trips, exploring different options, and selecting among the trips that were discussed during the dialogue. To drive research on dialogue systems towards handling such behaviour, we have annotated and released the dataset and we propose in this paper a task called frame tracking. This task consists of keeping track of different semantic frames throughout each dialogue. We propose a rule-based baseline and analyse the frame tracking task through this baseline.

show abstract

Section: Dialogue Managementmentioning

confidence: 99%

Frames: a corpus for adding memory to goal-oriented dialogue systems

Asri¹,

Schulz²,

Sharma³

et al. 2017

Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

176

167

View full text Add to dashboard Cite

show abstract

“…R : S → R, the immediate reward stochastic function, defines the goal(s) 1 . In some settings such as dialogue systems (Laroche et al 2009;Lemon and Pietquin 2012) or board games (Tesauro 1995;Silver et al 2016), R can be inferred directly from the state by the agent, and in some others such as in robotics and in Atari games (Mnih et al 2013;, R is generally unknown. Finally, γ ∈ [0, 1) the discount factor is a parameter given to the RL optimisation algorithm favouring short-term rewards.…”

Section: Introductionmentioning

confidence: 99%

Transfer Reinforcement Learning with Shared Dynamics

Laroche

Barlier

2017

AAAI

Self Cite

View full text Add to dashboard Cite

This article addresses a particular Transfer Reinforcement Learning (RL) problem: when dynamics do not change from one task to another, and only the reward function does. Our method relies on two ideas, the first one is that transition samples obtained from a task can be reused to learn on any other task: an immediate reward estimator is learnt in a supervised fashion and for each sample, the reward entry is changed by its reward estimate. The second idea consists in adopting the optimism in the face of uncertainty principle and to use upper bound reward estimates. Our method is tested on a navigation task, under four Transfer RL experimental settings: with a known reward function, with strong and weak expert knowledge on the reward function, and with a completely unknown reward function. It is also evaluated in a Multi-Task RL experiment and compared with the state-of-the-art algorithms. Results reveal that this method constitutes a major improvement for transfer/multi-task problems that share dynamics.

show abstract

“…Research on negotiation dialogue experiences a growth of interest. At first, Reinforcement Learning (Sutton and Barto, 1998), the most popular framework for dialogue management in dialogue systems (Levin and Pieraccini, 1997;Laroche et al, 2009;Lemon and Pietquin, 2012), was applied to negotiation with mitigated results (English and Heeman, 2005;Georgila and Traum, 2011;Lewis et al, 2017), because the non-stationary policy of the opposing player prevents those algorithms from converging consistently.…”

Section: Introductionmentioning

confidence: 99%