Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1062
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning

Abstract: End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors.We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-toend approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent repr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
149
0
2

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 274 publications
(151 citation statements)
references
References 30 publications
0
149
0
2
Order By: Relevance
“…Recently, deep reinforcement learning (DRL) [34] has been investigated for dialogue policy optimization, e.g. Deep Q-Networks (DQN) [8]- [10], [13], [20], [35]- [37], policy gradient methods [7], [11], and actorcritic approaches [9]. However, compared with GPRL and KTD-RL, most of these deep models are not sample-efficient.…”
Section: B Dialogue Policy Optimizationmentioning
confidence: 99%
“…Recently, deep reinforcement learning (DRL) [34] has been investigated for dialogue policy optimization, e.g. Deep Q-Networks (DQN) [8]- [10], [13], [20], [35]- [37], policy gradient methods [7], [11], and actorcritic approaches [9]. However, compared with GPRL and KTD-RL, most of these deep models are not sample-efficient.…”
Section: B Dialogue Policy Optimizationmentioning
confidence: 99%
“…• Once an action is chosen, it is conveyed to the environment, a reward is observed as described at the end of this section, and the agent's partner response is observed in order to update the dialogue history H (lines [11][12][13][14].…”
Section: Ensemble Of Drl Chatbotsmentioning
confidence: 99%
“…For our first baseline, we choose a relatively simple architecture: a multi-layer feed-forward neural network ( [14,15]) 2 . This network is applied to each dialog and candidate answer to yield a confidence metric defined in the interval [0, 1].…”
Section: Baselines Consideredmentioning
confidence: 99%
“…It is interesting to note that, as suggested in [1,2], methodologies based on rules may solve the goal-oriented dialog problem proposed in DSTC6 with full accuracy (i.e., no errors at all). By contrast, data-driven conversational systems (see, e.g., [3,4,5]), are typically easier to apply to new domains and often perform in a satisfactory manner, but usually are less accurate than rule-based methods.…”
Section: Introductionmentioning
confidence: 99%