Sample-efficient batch reinforcement learning for dialogue management optimization

Pietquin, Olivier; Geist, Matthieu; Chandramohan, Senthilkumar; Frezza-Buet, Hervé

doi:10.1145/1966407.1966412

Cited by 71 publications

(54 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Scalability Challenge is again related to the current absence of powerful methods in the field that can deal with large amounts of data or knowledge in a scalable way. Several approaches exist that aim to tackle complex and ambitious goals ( Janarthanam and Lemon 2010;Rieser et al 2010;Pietquin et al 2011), but they are currently confined to smallscale applications. 5.…”

Section: Resultsmentioning

confidence: 99%

Context‐Sensitive Natural Language Generation: From Knowledge‐Driven to Data‐Driven Techniques

Dethlefs¹

2014

Language and Linguist. Compass

View full text Add to dashboard Cite

Context-sensitive Natural Language Generation is concerned with the automatic generation of system output that is in several ways adaptive to its target audience or the situational circumstances of its production. In this article, I will provide an overview of the most popular methods that have been applied to context-sensitive generation. A particular focus will be on the shift from knowledge-driven to datadriven approaches that has been witnessed in the last decade. While this shift has offered powerful new methods for large-scale adaptivity and flexible output generation, purely data-driven approaches still struggle to reach the linguistic depth of their knowledge-driven predecessors. Bridging the gap between both types of approaches is therefore an important future research direction.

show abstract

Section: Resultsmentioning

confidence: 99%

Context‐Sensitive Natural Language Generation: From Knowledge‐Driven to Data‐Driven Techniques

Dethlefs¹

2014

Language and Linguist. Compass

View full text Add to dashboard Cite

show abstract

“…We use KTD-Q (Kalman Temporal Difference Qlearning (Geist and Pietquin, 2010)) to learn the dialog policy as it was designed to satisfy some of these properties and tested in a dialog system with simulated users (Pietquin et al, 2011). The properties we wished to be satisfied by the algorithm were the following:…”

Section: Dialog Strategy Learningmentioning

confidence: 99%

Integrated Learning of Dialog Strategies and Semantic Parsing

Padmakumar

Thomason

Mooney

2017

Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1

View full text Add to dashboard Cite

Natural language understanding and dialog management are two integral components of interactive dialog systems. Previous research has used machine learning techniques to individually optimize these components, with different forms of direct and indirect supervision. We present an approach to integrate the learning of both a dialog strategy using reinforcement learning, and a semantic parser for robust natural language understanding, using only natural dialog interaction for supervision. Experimental results on a simulated task of robot instruction demonstrate that joint learning of both components improves dialog performance over learning either of these components alone.

show abstract

“…One of the solutions proposed is to use batch RL [19] or hybrid learning [20] to learn DM policies directly on dialog (s, a) corpora. We propose a fitness function to support batchstyle policy learning using GA. First, a batch RL algorithm is performed on the corpus, inducing an estimated optimum Q-functionQ(s, a), and an corresponding implicitly defined policy π Q (s) = arg max aQ (s, a).…”

Section: Q-points Regression On Dialog Corporamentioning

confidence: 99%

“…Fitted Q-iteration (FQI) [21] is utilized for batch RL and is described in Algorithm 2. FQI has been applied to DM policies optimization and shows high dataefficiency [19]. The inputs to the algorithm are state-actionnext-state triplets in the form of {(s i,t , a i,t , s i,t+1 )}.…”

Section: Q-points Regression On Dialog Corporamentioning

confidence: 99%

Policy Optimization for Spoken Dialog Management Using Genetic Algorithm

Ren

Zhao

Yan

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThe optimization of spoken dialog management policies is a non-trivial task due to the erroneous inputs from speech recognition and language understanding modules. The dialog manager needs to ground uncertain semantic information at times to fully understand the need of human users and successfully complete the required dialog tasks. Approaches based on reinforcement learning are currently mainstream in academia and have been proved to be effective, especially when operating in noisy environments. However, in reinforcement learning the dialog strategy is often represented by complex numeric model and thus is incomprehensible to humans. The trained policies are very difficult for dialog system designers to verify or modify, which largely limits the deployment for commercial applications. In this paper we propose a novel framework for optimizing dialog policies specified in human-readable domain language using genetic algorithm. We present learning algorithms using user simulator and real human-machine dialog corpora. Empirical experimental results show that the proposed approach can achieve competitive performance on par with some state-of-the-art reinforcement learning algorithms, while maintaining a comprehensible policy structure. key words: spoken dialog management, spoken dialog system, genetic algorithm

show abstract

Sample-efficient batch reinforcement learning for dialogue management optimization

Cited by 71 publications

References 15 publications

Context‐Sensitive Natural Language Generation: From Knowledge‐Driven to Data‐Driven Techniques

Context‐Sensitive Natural Language Generation: From Knowledge‐Driven to Data‐Driven Techniques

Integrated Learning of Dialog Strategies and Semantic Parsing

Policy Optimization for Spoken Dialog Management Using Genetic Algorithm

Contact Info

Product

Resources

About