2017
DOI: 10.1007/s11263-016-0987-1
|View full text |Cite
|
Sign up to set email alerts
|

Movie Description

Abstract: Audio description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length movies. In addition we also collected and aligned movie scripts used in prior work and compare the two sour… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
235
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 274 publications
(235 citation statements)
references
References 79 publications
0
235
0
Order By: Relevance
“…The images in VCR are extracted from video clips from LSMDC [67] and MovieClips. These clips vary in length from a few seconds (LSMDC) to several minutes (MovieClips).…”
Section: B1 Shot Detection Pipelinementioning
confidence: 99%
“…The images in VCR are extracted from video clips from LSMDC [67] and MovieClips. These clips vary in length from a few seconds (LSMDC) to several minutes (MovieClips).…”
Section: B1 Shot Detection Pipelinementioning
confidence: 99%
“…Human Evaluation. Automatic metrics for evaluating generated sentences have frequently shown to be unreliable and not consistent with human judgments, especially for video description when there is only a single reference [28]. Hence, we conducted a human evaluation to evaluate the sentence quality on the test set of ActivityNet-Entities.…”
Section: Video Event Descriptionmentioning
confidence: 99%
“…because they might have appeared in similar contexts during training. This makes models less accountable and trustworthy, which is important if we hope such models will eventually assist people in need [2,28]. Additionally, grounded models can help to explain the model's decisions to humans and allow humans to diagnose them [21].…”
Section: Introductionmentioning
confidence: 99%
“…For example, in video question-answering [35], most questions center around the characters asking who they are, what they do, and even why they act in certain ways. The related task of video captioning [25] often uses a character agnostic way (replacing names by someone) making the captions very artificial and uninformative (e.g. someone opens the door).…”
Section: Introductionmentioning
confidence: 99%