2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00129
|View full text |Cite
|
Sign up to set email alerts
|

Broaden Your Views for Self-Supervised Video Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
45
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 84 publications
(45 citation statements)
references
References 32 publications
0
45
0
Order By: Relevance
“…Some examples include temporal consistency [190], [191], interframe predictability [192], geometric transformations [193], motion statistics [194] or playback speed [195], [196]. Moreover, there has been a recent surge of self-supervised vision tasks, such as SimCLR [182] and Barlow Twins [197] for images or BraVE [198] for video. Some such method, DINO [199], has achieved impressive results for image Transformers.…”
Section: The Road Aheadmentioning
confidence: 99%
“…Some examples include temporal consistency [190], [191], interframe predictability [192], geometric transformations [193], motion statistics [194] or playback speed [195], [196]. Moreover, there has been a recent surge of self-supervised vision tasks, such as SimCLR [182] and Barlow Twins [197] for images or BraVE [198] for video. Some such method, DINO [199], has achieved impressive results for image Transformers.…”
Section: The Road Aheadmentioning
confidence: 99%
“…Self-supervised Learning in Videos. While selfsupervised learning in videos were initially dominated by approaches based on pretext tasks unique to the video domain [2,31,42,53,55,61,69,[78][79][80]83] recent work focuses more on contrastive losses similar to the image domain [21,30,34,35,63,64]. A combination of previous Our video attention block takes a 3D tensor and applies self-attention along temporal and spatial dimension.…”
Section: Related Workmentioning
confidence: 99%
“…Unlike these works, we jointly vary spatial and temporal resolutions and use a predictive objective as self-supervision. The idea of views with limited locality is explored in [8,64]. While [8] uses views of varying locality for disentangling the representation space into temporally local and global features using contrastive objectives, our approach uses view locality to learn correspondences along and across dimensions with our predictive objective.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations