Kavosh Asadi scite author profile

End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors.We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-toend approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. In addition, HCNs can be optimized with supervised learning, reinforcement learning, or a mixture of both. HCNs attain stateof-the-art performance on the bAbI dialog dataset (Bordes and Weston, 2016), and outperform two commercially deployed customer-facing dialog systems.

show abstract

State Abstraction as Compression in Apprenticeship Learning

Abel

Arumugam

Asadi

et al. 2019

AAAI

View full text Add to dashboard Cite

State abstraction can give rise to models of environments that are both compressed and useful, thereby enabling efficient sequential decision making. In this work, we offer the first formalism and analysis of the trade-off between compression and performance made in the context of state abstraction for Apprenticeship Learning. We build on Rate-Distortion theory, the classic Blahut-Arimoto algorithm, and the Information Bottleneck method to develop an algorithm for computing state abstractions that approximate the optimal tradeoff between compression and performance. We illustrate the power of this algorithmic structure to offer insights into effective abstraction, compression, and reinforcement learning through a mixture of analysis, visuals, and experimentation.

show abstract

DeepMellow: Removing the Need for a Target Network in Deep Q-Learning

Asadi

Littman

et al. 2019

View full text Add to dashboard Cite

Deep Q-Network (DQN) is an algorithm that achieves human-level performance in complex domains like Atari games. One of the important elements of DQN is its use of a target network, which is necessary to stabilize learning. We argue that using a target network is incompatible with online reinforcement learning, and it is possible to achieve faster and more stable learning without a target network when we use Mellowmax, an alternative softmax operator. We derive novel properties of Mellowmax, and empirically show that the combination of DQN and Mellowmax, but without a target network, outperforms DQN with a target network.

show abstract

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

Williams¹,

Asadi²,

Zweig³

2017

Preprint

View full text Add to dashboard Cite

Lipschitz Continuity in Model-based Reinforcement Learning

Asadi¹,

Misra²,

Littman³

2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kavosh Asadi

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

State Abstraction as Compression in Apprenticeship Learning

DeepMellow: Removing the Need for a Target Network in Deep Q-Learning

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

Lipschitz Continuity in Model-based Reinforcement Learning

Contact Info

Product

Resources

About