2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00840
|View full text |Cite
|
Sign up to set email alerts
|

Learning and Using the Arrow of Time

Abstract: Figure 1: Seeing these ordered frames from videos, can you tell whether each video is playing forward or backward? (answer below 1 ). Depending on the video, solving the task may require (a) low-level understanding (e.g. physics), (b) high-level reasoning (e.g. semantics), or (c) familiarity with very subtle effects or with (d) camera conventions. In this work, we learn and exploit several types of knowledge to predict the arrow of time automatically with neural network models trained on large-scale video data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
253
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 346 publications
(253 citation statements)
references
References 15 publications
0
253
0
Order By: Relevance
“…Self-supervised learning defines a proxy task on unlabeled data and uses the pseudo-labels of that task to provide the model with supervisory signals. It is used in machine vision with proxy tasks such as predicting arrow of time [79], missing pixels [50], position of patches [14], image rotations [23], synthetic artifacts [33], image clusters [9], camera transformation in consecutive frames [3], rearranging shuffled patches [48], video colourization [73], and tracking of image patches [77] and has demonstrated promising results in learning and transferring visual features.…”
Section: Self-supervised Learningmentioning
confidence: 99%
“…Self-supervised learning defines a proxy task on unlabeled data and uses the pseudo-labels of that task to provide the model with supervisory signals. It is used in machine vision with proxy tasks such as predicting arrow of time [79], missing pixels [50], position of patches [14], image rotations [23], synthetic artifacts [33], image clusters [9], camera transformation in consecutive frames [3], rearranging shuffled patches [48], video colourization [73], and tracking of image patches [77] and has demonstrated promising results in learning and transferring visual features.…”
Section: Self-supervised Learningmentioning
confidence: 99%
“…Self-supervised learning on video collections. Learning from video [2,10,15,17,21,22,30,31,35,40,42,47,52,62,64] is a powerful paradigm, as unlike with image collections, there is additional temporal and sequential information. The aim of self-supervised learning from video can be to learn to predict future frames [47], or to learn to predict depth [12,14,62].…”
Section: Related Workmentioning
confidence: 99%
“…Self-supervision for Action Recognition. Self-supervision methods learn representations from the temporal [13,59] and multi-modal structure of video [1,25], leveraging pretraining on a large corpus of unlabelled videos. Methods exploiting the temporal consistency of video have predicted the order of a sequence of frames [13] or the arrow of time [59].…”
Section: Related Workmentioning
confidence: 99%