“…Table 1 displays the performance of several models on YouTube2Text. We compare our model with existing methods, including LSTM-E (Pan et al, 2016 ), h-RNN (Yu et al, 2016 ), aLSTMs (Gao et al, 2017 ), SCN (Gan et al, 2017 ), MTVC (Pasunuru and Bansal, 2017a ), ECO (Zolfaghari et al, 2018 ), SibNet (Liu et al, 2018 ), POS (Wang et al, 2019a ), MARN (Pei et al, 2019 ), JSRL-VCT (Hou et al, 2019 ), GRU-EVE (Aafaq et al, 2019 ), STG-KD (Pan et al, 2020 ), SAAT (Zheng et al, 2020 ), and ORG-TRL (Zhang et al, 2020 ). Our method outperforms all the other methods on all the metrics by a large margin.…”