Hybrid Code Networks: practical and efficient end-to-end dialog
            control with supervised and reinforcement learning

Williams, Jason D.; Asadi, Kavosh; Zweig, Geoffrey

doi:10.18653/v1/p17-1062

Cited by 274 publications

(151 citation statements)

References 30 publications

Supporting

Mentioning

149

Contrasting

Unclassified

Order By: Relevance

“…Recently, deep reinforcement learning (DRL) [34] has been investigated for dialogue policy optimization, e.g. Deep Q-Networks (DQN) [8]- [10], [13], [20], [35]- [37], policy gradient methods [7], [11], and actorcritic approaches [9]. However, compared with GPRL and KTD-RL, most of these deep models are not sample-efficient.…”

Section: B Dialogue Policy Optimizationmentioning

confidence: 99%

AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning

Zhi

Tan

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Dialogue policy plays an important role in taskoriented spoken dialogue systems. It determines how to respond to users. The recently proposed deep reinforcement learning (DRL) approaches have been used for policy optimization. However, these deep models are still challenging for two reasons: 1) Many DRL-based policies are not sample-efficient. 2) Most models don't have the capability of policy transfer between different domains. In this paper, we propose a universal framework, AgentGraph, to tackle these two problems. The proposed AgentGraph is the combination of GNN-based architecture and DRL-based algorithm. It can be regarded as one of the multiagent reinforcement learning approaches. Each agent corresponds to a node in a graph, which is defined according to the dialogue domain ontology. When making a decision, each agent can communicate with its neighbors on the graph. Under AgentGraph framework, we further propose Dual GNN-based dialogue policy, which implicitly decomposes the decision in each turn into a high-level global decision and a low-level local decision. Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark. Moreover, when transferred from the source task to a target task, these models not only have acceptable initial performance but also converge much faster on the target task.

show abstract

Section: B Dialogue Policy Optimizationmentioning

confidence: 99%

AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning

Zhi

Tan

et al. 2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…• Once an action is chosen, it is conveyed to the environment, a reward is observed as described at the end of this section, and the agent's partner response is observed in order to update the dialogue history H (lines [11][12][13][14].…”

Section: Ensemble Of Drl Chatbotsmentioning

confidence: 99%

Ensemble-based deep reinforcement learning for chatbots

et al. 2019

View full text Add to dashboard Cite

Trainable chatbots that exhibit fluent and human-like conversations remain a big challenge in artificial intelligence. Deep Reinforcement Learning (DRL) is promising for addressing this challenge, but its successful application remains an open question. This article describes a novel ensemble-based approach applied to value-based DRL chatbots, which use finite action sets as a form of meaning representation. In our approach, while dialogue actions are derived from sentence clustering, the training datasets in our ensemble are derived from dialogue clustering. The latter aim to induce specialised agents that learn to interact in a particular style. In order to facilitate neural chatbot training using our proposed approach, we assume dialogue data in raw text onlywithout any manually-labelled data. Experimental results using chitchat data reveal that (1) near human-like dialogue policies can be induced, (2) generalisation to unseen data is a difficult problem, and (3) training an ensemble of chatbot agents is essential for improved performance over using a single agent. In addition to evaluations using held-out data, our results are further supported by a human evaluation that rated dialogues in terms of fluency, engagingness and consistency -which revealed that our proposed dialogue rewards strongly correlate with human judgements. 1 1 Work carried out while the first author was visiting Samsung Research. 1

show abstract

“…For our first baseline, we choose a relatively simple architecture: a multi-layer feed-forward neural network ( [14,15]) 2 . This network is applied to each dialog and candidate answer to yield a confidence metric defined in the interval [0, 1].…”

Section: Baselines Consideredmentioning

confidence: 99%

“…It is interesting to note that, as suggested in [1,2], methodologies based on rules may solve the goal-oriented dialog problem proposed in DSTC6 with full accuracy (i.e., no errors at all). By contrast, data-driven conversational systems (see, e.g., [3,4,5]), are typically easier to apply to new domains and often perform in a satisfactory manner, but usually are less accurate than rule-based methods.…”

Section: Introductionmentioning

confidence: 99%