2021
DOI: 10.48550/arxiv.2104.13537
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Shot Contrastive Self-Supervised Learning for Scene Boundary Detection

Abstract: Approach Overview -Representative frames of 10 shots from 2 different scenes of the movie Stuart Little are shown. The story-arch of each scene is distinguishable and semantically coherent. We consider similar nearby shots (e.g. 5 and 3) as augmented versions of each other. This augmentation approach is able to capitalize on the underlying film-production process and can encode the scenestructure better than the existing augmentation methods. Given a current shot (query) we find a similar shot (key) within its… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…In this paper, the experiments are conducted on both SS task and SSC task. For SS task, LGSS [18] and ShotCoL [7] are currently the SOTA algorithms, so we use them as our strong baselines. Other methods such as Siamese [2], StoryGraph [30], Grouping [21] and etc., are not fresh enough to compare.…”
Section: Compared Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In this paper, the experiments are conducted on both SS task and SSC task. For SS task, LGSS [18] and ShotCoL [7] are currently the SOTA algorithms, so we use them as our strong baselines. Other methods such as Siamese [2], StoryGraph [30], Grouping [21] and etc., are not fresh enough to compare.…”
Section: Compared Methodsmentioning
confidence: 99%
“…LGSS [18] raises B-Net to identify boundaries and uses multimodal features, and ShotCoL [7] leverages contrastive learning. However, their work performed a 2-classes classification which refers to whether the shot should be segmented or not, ignoring richer relation information between shots within a scene.…”
Section: Related Work 21 Video Segmentationmentioning
confidence: 99%
See 1 more Smart Citation