End-to-end optimization of goal-driven and visually grounded dialogue systems

Strub, Florian; Vries, Harm de; Mary, Jérémie; Piot, Bilal; Courville, Aaron; Pietquin, Olivier

doi:10.24963/ijcai.2017/385

Cited by 82 publications

(120 citation statements)

References 4 publications

Supporting

Mentioning

119

Contrasting

Order By: Relevance

“…We ask three human subjects to play on the same split and the game is recognised as successful if at least two of them give the right answer. In our experiment, the average performance of humans was 79% compared to 52% and 70% for the supervised [9] and RL [26] models. We are even better than a model proposed in [37] (76%), which has three complex hand-crafted rewards.…”

Section: Guesswhatmentioning

confidence: 79%

“…This improvement is because the question generator has the chance to better explore possible questions. Additionally, the greedy approach outperforms others in the RL baseline in [26]. This illustrates that the distribution of the words obtained from the softmax in the question generator is not very peaked and the difference between the best and second best word is often small.…”

Section: Guesswhatmentioning

confidence: 86%

“…Since our approach seeks uncertain words, those words are exploited at training time, which leads to lower variance (a more peaked distribution) and better performance of the greedy selection. Beam search significantly increases performance when we carry out 5 rounds (as in [9,26]) of question-answering. This is because the most informative words are selected by our approach which, combined with the beam-search's mechanism for forward exploration, leads to better performance.…”

Section: Guesswhatmentioning

confidence: 99%

See 2 more Smart Citations

What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions

Abbasnejad

Shi

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

One of the core challenges in Visual Dialogue problems is asking the question that will provide the most useful information towards achieving the required objective. Encouraging an agent to ask the right questions is difficult because we don't know a-priori what information the agent will need to achieve its task, and we don't have an explicit model of what it knows already. We propose a solution to this problem based on a Bayesian model of the uncertainty in the implicit model maintained by the visual dialogue agent, and in the function used to select an appropriate output. By selecting the question that minimises the predicted regret with respect to this implicit model the agent actively reduces ambiguity. The Bayesian model of uncertainty also enables a principled method for identifying when enough information has been acquired, and an action should be selected. We evaluate our approach on two goal-oriented dialogue datasets, one for visual-based collaboration task and the other for a negotiation-based task. Our uncertainty-aware information-seeking model outperforms its counterparts in these two challenging problems.

show abstract

Section: Guesswhatmentioning

confidence: 79%

Section: Guesswhatmentioning

confidence: 86%

Section: Guesswhatmentioning

confidence: 99%

See 1 more Smart Citation

What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions

Abbasnejad

Shi

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…To train a questioner for solving the GuessWhat?! game, [3,7] construct an "oracle" network to mimic the answerer's behavior, regard it as part of the environment in the reinforcement learning setup and then apply the REINFORCE algorithm (or Monte Carlo Policy Gradient). The questioner learns to ask critical questions that help identify the target object by interacting with the oracle.…”

Section: Related Workmentioning

confidence: 99%

Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts

Chang

Peng

2019

2019 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

This paper tackles the problem of learning a questioner in the goal-oriented visual dialog task. Several previous works adopt model-free reinforcement learning. Most pretrain the model from a finite set of human-generated data. We argue that using limited demonstrations to kick-start the questioner is insufficient due to the large policy search space. Inspired by a recently proposed information theoretic approach, we develop two analytic experts to serve as a source of highquality demonstrations for imitation learning. We then take advantage of reinforcement learning to refine the model towards the goal-oriented objective. Experimental results on the GuessWhat?! dataset show that our method has the combined merits of imitation and reinforcement learning, achieving the state-of-the-art performance.

show abstract

“…This progress, in turn, generated an emerging research area, the learning of goal-oriented dialogs [6]. This research involves agents that conduct a multi-turn dialogue to achieve some task-specific goal, such as locating a specific object in a group of objects [7], inferring which image the user is thinking about [8], and providing customer services and restaurant reservations [6]. All these tasks require that the agent possesses the ability to conduct a multi-round dialog and to track the inter-dependence of each questionanswer pair.…”

Section: Introductionmentioning

confidence: 99%

Efficient Dialog Policy Learning via Positive Memory Retention

Zhao

Tresp

2018

2018 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

This paper is concerned with the training of recurrent neural networks as goal-oriented dialog agents using reinforcement learning. Training such agents with policy gradients typically requires a large amount of samples. However, the collection of the required data in form of conversations between chatbots and human agents is time-consuming and expensive. To mitigate this problem, we describe an efficient policy gradient method using positive memory retention, which significantly increases the sample-efficiency. We show that our method is 10 times more sample-efficient than policy gradients in extensive experiments on a new synthetic number guessing game. Moreover, in a real-word visual object discovery game, the proposed method is twice as sample-efficient as policy gradients and shows state-of-the-art performance.

show abstract

End-to-end optimization of goal-driven and visually grounded dialogue systems

Cited by 82 publications

References 4 publications

What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions

What's to Know? Uncertainty as a Guide to Asking Goal-Oriented Questions

Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts

Efficient Dialog Policy Learning via Positive Memory Retention

Contact Info

Product

Resources

About