2012
DOI: 10.1109/jstsp.2012.2229257
|View full text |Cite
|
Sign up to set email alerts
|

A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization

Abstract: Reinforcement learning is now an acknowledged approach for optimizing the interaction strategy of spoken dialogue systems. If the first considered algorithms were quite basic (like SARSA), recent works concentrated on more sophisticated methods. More attention has been paid to off-policy learning, dealing with the exploration-exploitation dilemma, sample efficiency or handling non-stationarity. New algorithms have been proposed to address these issues and have been applied to dialogue management. However, each… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
35
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 49 publications
(36 citation statements)
references
References 34 publications
1
35
0
Order By: Relevance
“…Standard Qlearning (Watkins, 1989) has also been tested as an online algorithm but unsuccessfully, which is compliant with previous works (e.g. (Daubigney et al, 2012)). …”
Section: Learningsupporting
confidence: 66%
“…Standard Qlearning (Watkins, 1989) has also been tested as an online algorithm but unsuccessfully, which is compliant with previous works (e.g. (Daubigney et al, 2012)). …”
Section: Learningsupporting
confidence: 66%
“…The choice of policy learning algorithm is important because learning POMDP policies is challenging and dialog applications exhibit properties not often encountered in other reinforcement learning applications (Daubigney et al, 2012). We use KTD-Q (Kalman Temporal Difference Qlearning (Geist and Pietquin, 2010)) to learn the dialog policy as it was designed to satisfy some of these properties and tested in a dialog system with simulated users (Pietquin et al, 2011).…”
Section: Dialog Strategy Learningmentioning
confidence: 99%
“…Many data-driven methods have been proposed among which RL-based ones are the most popular. The GASARSA [8] and KTD [9] algorithms are two of the representative methods based on on-line RL. These methods focus on high performance and sample-efficient on-line learning.…”
Section: Related Workmentioning
confidence: 99%