“…This suggests the effectiveness of our proposed question-guided video representations for VideoQA. When comparing Table 2: Comparison with existing approaches: Naïve Fusion (Alamri et al, 2019b;Zhuang et al, 2019), Attentional Fusion (Hori et al, 2018;Zhuang et al, 2019), Multi-Source Sequence-to-Sequence model (Pasunuru and Bansal, 2019), Modified Attentional Fusion with Maximum Mutual Information objective (Zhuang et al, 2019) and Hierarchical Attention with pre-trained embedding (Le et al, 2019), on the AVSD public test set. For each approach, we report its corpus-wide scores on BLEU-1 through BLEU-4, METEOR, ROUGE-L and CIDEr.…”