2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01363
|View full text |Cite
|
Sign up to set email alerts
|

Scene Consistency Representation Learning for Video Scene Segmentation

Abstract: less inductive bias to verify the quality of the shot features. Our method achieves the state-of-the-art performance on the task of Video Scene Segmentation. Additionally, we suggest a more fair and reasonable benchmark to evaluate the performance of Video Scene Segmentation methods. The code is made available. 1

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(36 citation statements)
references
References 28 publications
0
36
0
Order By: Relevance
“…We constructed a machine learning pipeline (Figure 2) analyze the video in both frame-and object-level, and identify keyframes. Noticeably, although models like [24,71] can detect scene transitions in an end-to-end way, their priority is based on the visual similarity of different scenes. On the contrary, our aim is to offer users a richer exploration of varied objects, ensuring that keyframes are densely populated throughout the video for comprehensive exploration.…”
Section: Keyframe Detection and Description Generation Pipelinementioning
confidence: 99%
“…We constructed a machine learning pipeline (Figure 2) analyze the video in both frame-and object-level, and identify keyframes. Noticeably, although models like [24,71] can detect scene transitions in an end-to-end way, their priority is based on the visual similarity of different scenes. On the contrary, our aim is to offer users a richer exploration of varied objects, ensuring that keyframes are densely populated throughout the video for comprehensive exploration.…”
Section: Keyframe Detection and Description Generation Pipelinementioning
confidence: 99%
“…Subsequently, they maximize the similarity between the query and the positive key while minimizing the query’s similarity with a set of randomly selected shots. For the positive key selection, Wu et al suggested the scene consistency selection approach [ 34 ], which enables the selection to accomplish a more challenging goal. They create a soft positive sample using query-specific individual information and an online clustering of samples in a batch to produce a positive sample.…”
Section: Related Workmentioning
confidence: 99%
“…http://kaldir.vc.in.tum.de/scannet benchmark/data efficient/ Pre-training for 3D Representation Learning. Many recent works propose to pre-train networks on source datasets with auxiliary tasks such as low-level point cloud geometric registration [27], 3D local structural prediction [78], the completion of the occluded point clouds [79], and the foregroundbackground feature discrimination [58], with effective learning strategies such as contrastive learning [27] and masked generative modelling [80], [81]. Then they finetune the weights of the trained networks for the downstream target tasks to boost their performances.…”
Section: Related Workmentioning
confidence: 99%
“…The convex decomposition [103] is conducted in an approximate manner to perform 3D scene parsing on the object parts. More approaches [104] have been proposed recently, which utilize class prototypes and masked point cloud modeling [81], [105], [106] to learn informative representations for downstream 3D scene understanding. To sum up, although approaches have been proposed to alleviate the data efficiency problem, the models for weakly supervised learning lack the capacity to recognize novel categories beyond the labeled training set.…”
Section: Related Workmentioning
confidence: 99%