2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.399
|View full text |Cite
|
Sign up to set email alerts
|

LSTM Self-Supervision for Detailed Behavior Analysis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
42
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 53 publications
(42 citation statements)
references
References 22 publications
0
42
0
Order By: Relevance
“…In the context of video data, the temporal structure of video data has been exploited to fine-tune networks on train-1 https://github.com/annusha/unsup_temp_embed ing data without labels [34,2]. The temporal ordering of video frames has also been used to learn feature representations for action recognition [20,23,9,4].…”
Section: Related Workmentioning
confidence: 99%
“…In the context of video data, the temporal structure of video data has been exploited to fine-tune networks on train-1 https://github.com/annusha/unsup_temp_embed ing data without labels [34,2]. The temporal ordering of video frames has also been used to learn feature representations for action recognition [20,23,9,4].…”
Section: Related Workmentioning
confidence: 99%
“…Other types of pretext tasks proposed for unsupervised learning include understanding the correct order of video frames [6,36] or predicting the spatial relation between image patches [12], e.g., jigsaw puzzle solving as a pretext task was exploited by Noorozi and Favaro [37]. In another work, Noroozi et al [38] proposed to train an unsupervised model by counting the primitive elements of images.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of the binary task of tuple verification like Misra et al [32], our self-supervised task is to predict the exact permutation of the patches, much like the jigsaw puzzle task of Noroozi and Favaro [34] only on videos. Some recent approaches have used temporal coherency-based self-supervision on video sequences to model fine-grained human poses and activities [31] and animal behavior [7]. Our model is not specialized for motor skill learning like [7] and we do not require bounding boxes for humans in the video frames as in [31].…”
Section: Related Workmentioning
confidence: 99%
“…Some recent approaches have used temporal coherency-based self-supervision on video sequences to model fine-grained human poses and activities [31] and animal behavior [7]. Our model is not specialized for motor skill learning like [7] and we do not require bounding boxes for humans in the video frames as in [31].…”
Section: Related Workmentioning
confidence: 99%