2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00092
|View full text |Cite
|
Sign up to set email alerts
|

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
23
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(28 citation statements)
references
References 42 publications
1
23
0
Order By: Relevance
“…2) context-based models that consider the contextual features of data, such as spatial or temporal relationships between different parts of each sample or inter-sample similarity (e.g. clustering methods); 3) semantic-based models that are trained based on automatically generated semantic labels, such as depth of image and moving objects and 4) cross-modality-based models in which the pretext tasks are to verify whether the two channels of input data are relevant/synchronised or not, such as RGB-flow [139], visual-audio [8] or visual-text [112] correspondence correlation. However, this classification scheme is specifically designed for studies on CV problems.…”
Section: Background and Frameworkmentioning
confidence: 99%
“…2) context-based models that consider the contextual features of data, such as spatial or temporal relationships between different parts of each sample or inter-sample similarity (e.g. clustering methods); 3) semantic-based models that are trained based on automatically generated semantic labels, such as depth of image and moving objects and 4) cross-modality-based models in which the pretext tasks are to verify whether the two channels of input data are relevant/synchronised or not, such as RGB-flow [139], visual-audio [8] or visual-text [112] correspondence correlation. However, this classification scheme is specifically designed for studies on CV problems.…”
Section: Background and Frameworkmentioning
confidence: 99%
“…This intuitive work motivated researchers to use optical flow as a representation of motion for activity understanding [12,30,71,72], and they have achieved significant improvements over RGB-only models in the supervised learning literature. Inspired by this success, many recent SSL works [33,37,80] have explored using optical flow (OF) to advance SSL beyond RGB-only baselines. Han et al (CoCLR) [37] used OF to retrieve positive samples for the infoNCA [63], which led to significant improvements.…”
Section: Introductionmentioning
confidence: 99%
“…Nevertheless, CoCLR did not utilize OF for training the backbone and hence may not have realized the full potential of learning motion representations. VICC [80] adopted the online cluster assignment [11] to videos by considering OF as another view of RGB, and minimized the distance between RGB and OF features during online clustering. This method obtained SOTA results on various datasets.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations