2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.79
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Representation Learning by Sorting Sequences

Abstract: We present an unsupervised representation learning approach using videos without semantic labels. We leverage the temporal coherence as a supervisory signal by formulating representation learning as a sequence sorting task. We take temporally shuffled frames (i.e., in non-chronological order) as inputs and train a convolutional neural network to sort the shuffled sequences. Similar to comparison-based sorting algorithms, we propose to extract features from all frame pairs and aggregate them to predict the corr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
452
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 570 publications
(452 citation statements)
references
References 32 publications
(70 reference statements)
0
452
0
Order By: Relevance
“…The results are given in Table 4, four phenomena can be observed: First, when self-supervised training with only UCF101, our DPC (60.6%) outperforms all previous methods under similar settings. Note that OPN [22] performs worse when input resolution increases, which indicates a simple self-supervised task like order prediction may not capture the rich semantics from videos. Second, when using Kinetics-400 for self-supervised pre-training, our DPC (68.2%) outperforms all the previous methods by a large margin.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…The results are given in Table 4, four phenomena can be observed: First, when self-supervised training with only UCF101, our DPC (60.6%) outperforms all previous methods under similar settings. Note that OPN [22] performs worse when input resolution increases, which indicates a simple self-supervised task like order prediction may not capture the rich semantics from videos. Second, when using Kinetics-400 for self-supervised pre-training, our DPC (68.2%) outperforms all the previous methods by a large margin.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…In the context of video data, the temporal structure of video data has been exploited to fine-tune networks on train-1 https://github.com/annusha/unsup_temp_embed ing data without labels [34,2]. The temporal ordering of video frames has also been used to learn feature representations for action recognition [20,23,9,4]. Lee et al [20] learn a video representation in an unsupervised manner by solving a sequence sorting problem.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, these methods are often lin-ear, and/or treat the data as cross-sectional, and thus do not exploit non-linear relationships. Self-supervised learning was already successfully applied to timeseries video data in the field of computer vision, where Lee et al developed a solution based on time-shuffling [5], which inspired our method.…”
Section: Introductionmentioning
confidence: 99%