Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1152
|View full text |Cite
|
Sign up to set email alerts
|

Improving Generative Visual Dialog by Answering Diverse Questions

Abstract: Prior work on training generative Visual Dialog models with reinforcement learning (Das et al., 2017b) has explored a Q-BOT-A-BOT image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialogconditioned image-guessing task. However, this improvement saturates and starts degrading after a few rounds of interaction, and does not lead to a better Visual Dialog model. We find that this is due in part to repeated interactions between Q-BOT and A-BOT during … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
36
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 31 publications
(37 citation statements)
references
References 13 publications
1
36
0
Order By: Relevance
“…We agree with (Thomason et al, 2019) that incremental evaluation metrics such as ours should look further back into the dialogue history. We believe that language and vision systems should also be evaluated on aspects such as grammatically, truthfulness, diversity and other aspects as done in previous work (Lee et al, 2018;Ray et al, 2019;Xie et al, 2020;Murahari et al, 2019). In this paper we focus on whether a question is effective and referential considering the dialogue history and the visual context.…”
Section: Previous Workmentioning
confidence: 98%
“…We agree with (Thomason et al, 2019) that incremental evaluation metrics such as ours should look further back into the dialogue history. We believe that language and vision systems should also be evaluated on aspects such as grammatically, truthfulness, diversity and other aspects as done in previous work (Lee et al, 2018;Ray et al, 2019;Xie et al, 2020;Murahari et al, 2019). In this paper we focus on whether a question is effective and referential considering the dialogue history and the visual context.…”
Section: Previous Workmentioning
confidence: 98%
“…other, which we call the Oracle, answers. For Vis-Dial most of the work focused on the answerer, but in-depth evaluation has been carried out on the questioner too (eg., Murahari et al (2019); Testoni et al (2019)). For GuessWhat?…”
Section: Introductionmentioning
confidence: 99%
“…because of the simplicity of its dialogue structure (polar question-answer pairs). Recent work in the literature highlights the inability of the accuracy in the guessing task to serve as a good proxy of the quality of the underlying dialogues, with a particular focus on surface-level features such as the presence of repetitions Murahari et al, 2019;Testoni et al, 2019). We extend this claim by looking at hallucination, an under-studied but crucial issue in Visual Dialogues.…”
Section: Related Workmentioning
confidence: 99%