2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00967
|View full text |Cite
|
Sign up to set email alerts
|

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
39
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 46 publications
(39 citation statements)
references
References 23 publications
0
39
0
Order By: Relevance
“…Furthermore, there are various works on contrastive video representation learning, which viewed contrastive learning from temporal [33], spatio-temporal [35], and temporal-equivariant [20] perspectives. Especially, [6] applied self-supervised contrastive pretext task to learn appropriate feature representation, proving its effectiveness on detecting shot frames. However, it still needs downstream task training (supervised learning) to get final results, while our approach directly yields event boundaries.…”
Section: Contrastive Representation Learningmentioning
confidence: 99%
“…Furthermore, there are various works on contrastive video representation learning, which viewed contrastive learning from temporal [33], spatio-temporal [35], and temporal-equivariant [20] perspectives. Especially, [6] applied self-supervised contrastive pretext task to learn appropriate feature representation, proving its effectiveness on detecting shot frames. However, it still needs downstream task training (supervised learning) to get final results, while our approach directly yields event boundaries.…”
Section: Contrastive Representation Learningmentioning
confidence: 99%
“…Temporal segmentation of actions in videos has been widely explored in previous works [40,41,69,86,89,104]. Video shot boundary detection and scene detection tasks are also relevant and has been explored in many previous studies [11,29,30,65,102], which aim at finding the visual change or scene boundaries.…”
Section: Related Workmentioning
confidence: 99%
“…The objective of optimal transport involves solving linear programming and may cause potential computational burdens since it has 𝑂 (𝑛 3 ) efficiency. To solve this issue, we add an entropic regularization term equation (11) and the objective of our optimal transport distance becomes…”
Section: Cross-domain Alignment For Multimodal Summarizationmentioning
confidence: 99%
“…Video scene detection is the most relevant task. However, previous methods only used visual information to detect the scene change [52,56,57,11,81], so the methods can not be adopted directly for Livestream videos either.…”
Section: Related Workmentioning
confidence: 99%