2019 IEEE International Conference on Multimedia and Expo (ICME) 2019
DOI: 10.1109/icme.2019.00096
|View full text |Cite
|
Sign up to set email alerts
|

Learning Goal-Oriented Visual Dialog Agents: Imitating and Surpassing Analytic Experts

Abstract: This paper tackles the problem of learning a questioner in the goal-oriented visual dialog task. Several previous works adopt model-free reinforcement learning. Most pretrain the model from a finite set of human-generated data. We argue that using limited demonstrations to kick-start the questioner is insufficient due to the large policy search space. Inspired by a recently proposed information theoretic approach, we develop two analytic experts to serve as a source of highquality demonstrations for imitation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 8 publications
0
2
0
Order By: Relevance
“…The History-Advantage Sequence Training (HAST) comprehensively integrates the dialogue history with co-attention modules for visual context in history encoding and one history-aware gate. For more efficient training dialogue models with RL, the probabilistic framework is introduced [9] with an Information Gain Expert and a Target Posterior Expert, which provide virtually unlimited expert demonstrations for pre-training the questioner and will be refined for a even better policy with RL.…”
Section: Visual-based Dialogue Strategies Optimizationmentioning
confidence: 99%
“…The History-Advantage Sequence Training (HAST) comprehensively integrates the dialogue history with co-attention modules for visual context in history encoding and one history-aware gate. For more efficient training dialogue models with RL, the probabilistic framework is introduced [9] with an Information Gain Expert and a Target Posterior Expert, which provide virtually unlimited expert demonstrations for pre-training the questioner and will be refined for a even better policy with RL.…”
Section: Visual-based Dialogue Strategies Optimizationmentioning
confidence: 99%
“…Currently, bridging the gap between vision and language, such as visual captioning [1,2], visual question answering [3,4] and visual dialog [5,6,7], has attracted huge attention in the computer vision community. Visual dialog can be regarded as an extension of VQA.…”
Section: Introductionmentioning
confidence: 99%