2021
DOI: 10.48550/arxiv.2106.10137
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

Abstract: Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning. They are not suitable for exploiting the rich dynamical structure of video however, as operations are done on many augmented instances. In this paper we propose "Video Cross-Stream Prototypical Contrasting", a novel method which predicts consistent prototype assignments from both RGB and optical flow views, operating on sets o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 49 publications
(78 reference statements)
0
1
0
Order By: Relevance
“…In the same line of the SimCLR framework, Han et al [19] have proposed a new self-supervised co-training method called CoCLR, in which contrastive learning with positive samples mining is used to align RGB and optical flow data. Toering et al combine the co-training mechanism and the idea of the SwAV approach [12] to propose prototypical contrastive learning [18] as an alternative to instance-based contrastive learning. VOLUME 10, 2022 Similarly, audio data can provide a useful information about the content of videos.…”
Section: Cross-modal Video Representation Learningmentioning
confidence: 99%
“…In the same line of the SimCLR framework, Han et al [19] have proposed a new self-supervised co-training method called CoCLR, in which contrastive learning with positive samples mining is used to align RGB and optical flow data. Toering et al combine the co-training mechanism and the idea of the SwAV approach [12] to propose prototypical contrastive learning [18] as an alternative to instance-based contrastive learning. VOLUME 10, 2022 Similarly, audio data can provide a useful information about the content of videos.…”
Section: Cross-modal Video Representation Learningmentioning
confidence: 99%