2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops) 2011
DOI: 10.1109/iccvw.2011.6130425
|View full text |Cite
|
Sign up to set email alerts
|

Human Focused Video Description

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(25 citation statements)
references
References 18 publications
0
25
0
Order By: Relevance
“…Using a set of templates they generate text for their SR. Similarly, [11] uses templates to describe videos on the TREC Video summarization task. [3] follows a different route and uses a topic model to jointly model textual and visual words and a tripartite graph based on object/concept detectors.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Using a set of templates they generate text for their SR. Similarly, [11] uses templates to describe videos on the TREC Video summarization task. [3] follows a different route and uses a topic model to jointly model textual and visual words and a tripartite graph based on object/concept detectors.…”
Section: Related Workmentioning
confidence: 99%
“…While for descriptions of images, recent approaches have proposed to statistically model the conversion from images to text [5,15,16,18], most approaches for video description use rules and templates to generated video descriptions [14,9,2,10,11,26,3,8]. Although these works have started exploring the domain of describing visual content, important research questions remain: (1) How to best approach the conversion from visual information to linguistic expressions?…”
Section: Introductionmentioning
confidence: 99%
“…Recent approaches to describing video with natural language have made use of templates, retrieval, or language models [11], [59], [60], [60], [61], [62], [63], [64]. To our knowledge, we present the first application of deep models to the video description task.…”
Section: Prior Workmentioning
confidence: 99%
“…Finally, the method applies syntactic rules to generate a whole sentence. Following this work, a series of studies are conducted to utilize such technique to enhance different multimedia applications [2], [3], [4]. And there are some works tackle the problem with probabilistic graphical model.…”
Section: B Visual Captioningmentioning
confidence: 99%