Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence 2017
DOI: 10.24963/ijcai.2017/385
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end optimization of goal-driven and visually grounded dialogue systems

Abstract: End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
119
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 82 publications
(120 citation statements)
references
References 4 publications
1
119
0
Order By: Relevance
“…We ask three human subjects to play on the same split and the game is recognised as successful if at least two of them give the right answer. In our experiment, the average performance of humans was 79% compared to 52% and 70% for the supervised [9] and RL [26] models. We are even better than a model proposed in [37] (76%), which has three complex hand-crafted rewards.…”
Section: Guesswhatmentioning
confidence: 79%
See 2 more Smart Citations
“…We ask three human subjects to play on the same split and the game is recognised as successful if at least two of them give the right answer. In our experiment, the average performance of humans was 79% compared to 52% and 70% for the supervised [9] and RL [26] models. We are even better than a model proposed in [37] (76%), which has three complex hand-crafted rewards.…”
Section: Guesswhatmentioning
confidence: 79%
“…This improvement is because the question generator has the chance to better explore possible questions. Additionally, the greedy approach outperforms others in the RL baseline in [26]. This illustrates that the distribution of the words obtained from the softmax in the question generator is not very peaked and the difference between the best and second best word is often small.…”
Section: Guesswhatmentioning
confidence: 86%
See 1 more Smart Citation
“…To train a questioner for solving the GuessWhat?! game, [3,7] construct an "oracle" network to mimic the answerer's behavior, regard it as part of the environment in the reinforcement learning setup and then apply the REINFORCE algorithm (or Monte Carlo Policy Gradient). The questioner learns to ask critical questions that help identify the target object by interacting with the oracle.…”
Section: Related Workmentioning
confidence: 99%
“…This progress, in turn, generated an emerging research area, the learning of goal-oriented dialogs [6]. This research involves agents that conduct a multi-turn dialogue to achieve some task-specific goal, such as locating a specific object in a group of objects [7], inferring which image the user is thinking about [8], and providing customer services and restaurant reservations [6]. All these tasks require that the agent possesses the ability to conduct a multi-round dialog and to track the inter-dependence of each questionanswer pair.…”
Section: Introductionmentioning
confidence: 99%