2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015
DOI: 10.1109/cvpr.2015.7298940
|View full text |Cite
|
Sign up to set email alerts
|

A dataset for Movie Description

Abstract: Descriptive video service (DVS) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed DVS, which is temporally aligned to full length HD movies. In addition we also collected the aligned movie scripts which have been used in prior wo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
269
0
1

Year Published

2015
2015
2018
2018

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 356 publications
(272 citation statements)
references
References 58 publications
2
269
0
1
Order By: Relevance
“…Several datasets [3]- [10] for action recognition are designed and opened to the public but some of them are not suitable for realistic events.…”
Section: Video Datasets For Human Action and Activitymentioning
confidence: 99%
“…Several datasets [3]- [10] for action recognition are designed and opened to the public but some of them are not suitable for realistic events.…”
Section: Video Datasets For Human Action and Activitymentioning
confidence: 99%
“…Automatic Multimodal Content Analysis (AMCA), on the other hand, consists of computer-driven detection of visual and auditory elements from multimedia (Rohrbach & al 2015;Viitaniemi & al 2015). AMCA is cost-effective and produces consistent output, but is still insufficient for high-level semantic analysis.…”
Section: Ad Vs Amcamentioning
confidence: 99%
“…6 we present and discuss the results of the LSMDC 2015 andLSMDC 2016. This work is partially based on the original publications from Rohrbach et al (2015c, b) and the technical report from Torabi et al (2015). Torabi et al (2015) collected M-VAD, Rohrbach et al (2015c) collected the MPII-MD dataset and presented the translation-based description approach. Rohrbach et al (2015b) proposed the VisualLabels approach.…”
Section: Figmentioning
confidence: 99%
“…(c) Focusing on more "visual" labels helps: we reduce the LSTM input dimensionality to 263 while improving the performance. (Rohrbach et al 2014), and showed the comparable performance to manually annotated SRs, see Rohrbach et al (2015c). In the following we use the best performing "Visual Labels" approach, Table 8, line (8).…”
Section: Robust Visual Classifiersmentioning
confidence: 99%
See 1 more Smart Citation