2011
DOI: 10.1145/1966407.1966412
|View full text |Cite
|
Sign up to set email alerts
|

Sample-efficient batch reinforcement learning for dialogue management optimization

Abstract: Spoken Dialogue Systems (SDS) are systems which have the ability to interact with human beings using natural language as the medium of interaction. A dialogue policy plays a crucial role in determining the functioning of the dialogue management module. Handcrafting the dialogue policy is not always an option, considering the complexity of the dialogue task and the stochastic behavior of users. In recent years approaches based on Reinforcement Learning (RL) for policy optimization in dialogue management have be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 71 publications
(54 citation statements)
references
References 15 publications
0
54
0
Order By: Relevance
“…The Scalability Challenge is again related to the current absence of powerful methods in the field that can deal with large amounts of data or knowledge in a scalable way. Several approaches exist that aim to tackle complex and ambitious goals ( Janarthanam and Lemon 2010;Rieser et al 2010;Pietquin et al 2011), but they are currently confined to smallscale applications. 5.…”
Section: Resultsmentioning
confidence: 99%
“…The Scalability Challenge is again related to the current absence of powerful methods in the field that can deal with large amounts of data or knowledge in a scalable way. Several approaches exist that aim to tackle complex and ambitious goals ( Janarthanam and Lemon 2010;Rieser et al 2010;Pietquin et al 2011), but they are currently confined to smallscale applications. 5.…”
Section: Resultsmentioning
confidence: 99%
“…We use KTD-Q (Kalman Temporal Difference Qlearning (Geist and Pietquin, 2010)) to learn the dialog policy as it was designed to satisfy some of these properties and tested in a dialog system with simulated users (Pietquin et al, 2011). The properties we wished to be satisfied by the algorithm were the following:…”
Section: Dialog Strategy Learningmentioning
confidence: 99%
“…One of the solutions proposed is to use batch RL [19] or hybrid learning [20] to learn DM policies directly on dialog (s, a) corpora. We propose a fitness function to support batchstyle policy learning using GA. First, a batch RL algorithm is performed on the corpus, inducing an estimated optimum Q-functionQ(s, a), and an corresponding implicitly defined policy π Q (s) = arg max aQ (s, a).…”
Section: Q-points Regression On Dialog Corporamentioning
confidence: 99%
“…Fitted Q-iteration (FQI) [21] is utilized for batch RL and is described in Algorithm 2. FQI has been applied to DM policies optimization and shows high dataefficiency [19]. The inputs to the algorithm are state-actionnext-state triplets in the form of {(s i,t , a i,t , s i,t+1 )}.…”
Section: Q-points Regression On Dialog Corporamentioning
confidence: 99%