2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.495
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Learning from Narrated Instruction Videos

Abstract: We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a sing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
205
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 233 publications
(207 citation statements)
references
References 25 publications
1
205
0
1
Order By: Relevance
“…Please note that there are other captioning tasks that are related to our research, such as dense captioning [22] and video captioning [1,48].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Please note that there are other captioning tasks that are related to our research, such as dense captioning [22] and video captioning [1,48].…”
Section: Related Workmentioning
confidence: 99%
“…We propose a weighting strategy based on words' similarity by considering the semantic information. For word s i , suppose the synonym set ('synset') of its kth meaning is ss ik by WordNet, 1 we compute the weight of the synset as follows:…”
Section: Weighted Trainingmentioning
confidence: 99%
“…Among them, instructional videos provide more intuitive visual examples, and will be focused on in this paper. With the explosion of video data on the Internet, people around the world have uploaded and watched substantial instructional videos [6], [59], covering miscellaneous categories. As suggested by the scientists in educational psychology [54], novices often face difficulties in learning from the whole realistic task, and it is necessary to divide the whole task into smaller segments or steps as a form of simplification.…”
Section: Introductionmentioning
confidence: 99%
“…Accordingly, a variety of relative tasks have been studied by morden computer vision community in recent years (e.g., action temporal localization [74], [80], video summarization [23], [49], [79] and video caption [35], [77], [83], etc). Also, increasing efforts have been devoted to exploring different challenges of instructional video analysis [6], [31], [59], [82] evidence, Fig. 2 shows the growing number of publications in the top venues over the recent ten years.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation