Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.61
|View full text |Cite
|
Sign up to set email alerts
|

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Abstract: Captioning is a crucial and challenging task for video understanding. In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene. Observable changes such as movements, manipulations, and transformations of the objects in the scene, are reflected in conventional video captioning. Unlike images, actions in videos are also inherently linked to social aspects such as intentions (why the action is taking place), effects (what changes due to the action), and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 54 publications
(33 citation statements)
references
References 29 publications
0
33
0
Order By: Relevance
“…KnowRef. In Emami et al (2019), an end-toend neural system (Lee et al, 2018) Klein and Nabi (2020) finetune their unsupervised CSS model. Finally, UnifiedQA (Khashabi et al, 2020), which is pre-trained on eight seed QA datasets spanning four different formats in a unified way, is finetuned on WinoGrande.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…KnowRef. In Emami et al (2019), an end-toend neural system (Lee et al, 2018) Klein and Nabi (2020) finetune their unsupervised CSS model. Finally, UnifiedQA (Khashabi et al, 2020), which is pre-trained on eight seed QA datasets spanning four different formats in a unified way, is finetuned on WinoGrande.…”
Section: Discussionmentioning
confidence: 99%
“…Kocijan et al (2019b) and Kocijan et al (2019a) are the most similar works to us and we will discuss the details in Section 3.1. Most recently, Klein and Nabi (2020) study a contrastive self-supervised learning approach (CSS) for WSC and DPR and also establish the first unsupervised baseline for KnowRef (Emami et al, 2019). On WinoGrande, knowledge hunting (Prakash et al, 2019) and language models ensemble (Sakaguchi et al, 2020) have been studied.…”
Section: Related Workmentioning
confidence: 99%
“…Implicit Text Generation for a Visual: VisualComet (Park et al, 2020) and Video2Commonsense (Fang et al, 2020) have made initial attempts to derive implicit information about images/videos contrary to traditional factual descriptions which leverage only visual attributes. VisualComet aims to generate commonsense inferences about events that could have happened before, events that can happen after and people's intents at present for each subject in a given image.…”
Section: Related Workmentioning
confidence: 99%
“…Commonsense Reasoning. Recently, commonsense reasoning has emerged as an important topic in both the language (Zellers et al, 2018(Zellers et al, , 2019bSap et al, 2019) and vision (Vedantam et al, 2015b;Zellers et al, 2019a;Zadeh et al, 2019;Fang et al, 2020) communities. Zellers et al (2018Zellers et al ( , 2019b build multiple-choice QA datasets for commonsense inference with text context, Zellers et al (2019a); Park et al (2020) propose datasets for commonsense-based QA and captioning on still images.…”
Section: Related Workmentioning
confidence: 99%