2020
DOI: 10.48550/arxiv.2010.14810
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cycle-Contrast for Self-Supervised Video Representation Learning

Abstract: We present Cycle-Contrastive Learning (CCL), a novel self-supervised method for learning video representation. Following a nature that there is a belong and inclusion relation of video and its frames, CCL is designed to find correspondences across frames and videos considering the contrastive representation in their domains respectively. It is different from recent approaches that merely learn correspondences across frames or clips. In our method, the frame and video representations are learned from a single n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 23 publications
0
8
0
Order By: Relevance
“…Also, the VSKD model tested with the left wrist acceleromeer data performs better compared to the previous study where accelerometer data from six locations were used [19]. In Table 3, while accelerometer data from the phone is the only modality in the testing phase, the method achieves better F-score performance compared to [11,21] in which either video streams or accelerometer data from phone and watch was used in the testing phase. This validates that the VSKD approach can effectively learn knowledge from the video modality to improve the accuracy performance of sensor-based HAR.…”
Section: Resultsmentioning
confidence: 82%
“…Also, the VSKD model tested with the left wrist acceleromeer data performs better compared to the previous study where accelerometer data from six locations were used [19]. In Table 3, while accelerometer data from the phone is the only modality in the testing phase, the method achieves better F-score performance compared to [11,21] in which either video streams or accelerometer data from phone and watch was used in the testing phase. This validates that the VSKD approach can effectively learn knowledge from the video modality to improve the accuracy performance of sensor-based HAR.…”
Section: Resultsmentioning
confidence: 82%
“…Under the linear probe setting, our method obtains the best results on both datasets. Specifically, our method with S3D and R3D-18 backbones outperform contrastive learning based approaches, CBT [54] and CCL [34], respectively, by a large margin. Even when compared with MemDPC [23] which leverages two stream information (RGB and flow), with larger resolution, our method still shows significant advantages.…”
Section: Evaluation On Downstream Tasksmentioning
confidence: 88%
“…Recently, inspired by the success of contrastive learning in static image, a line of works expanded contrastive learning pipeline to video domain [17,50,44,64,41]. Typically, [22,23] employed In-foNCE loss for dense future prediction, [34,24] performed instance discrimination across different domains to boost video representation. Though contrastive self-supervised learning contributes to better representation, the temporal information in videos is not well leveraged.…”
Section: Self-supervised Video Representation Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…The few direct extensions of SimCLR for video (Bai et al 2020;Qian et al 2020;Lorre et al 2020) target action recognition on few seconds short clips. Others integrate contrastive learning by bringing together next-frame feature predictions with actual representations (Kong et al 2020;Lorre et al 2020), using path-object tracks for bringing cycleconsistency (Wang, Zhou, and Li 2020), and considering multiple viewpoints (Sermanet et al 2018) or accompanying modalities like audio (Alwassel et al 2019) or text (Miech et al 2020). We are inspired by these works to develop contrastive learning for long-range segmentation.…”
Section: Introductionmentioning
confidence: 99%