Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413725
|View full text |Cite
|
Sign up to set email alerts
|

Occlusion Detection for Automatic Video Editing

Abstract: Videos have become the new preference comparing with images in recent years. However, during the recording of videos, the cameras are inevitably occluded by some objects or persons that pass through the cameras, which would highly increase the workload of video editors for searching out such occlusions. In this paper, for releasing the burden of video editors, a frame-level video occlusion detection method is proposed, which is a fundamental component of automatic video editing. The proposed method enhances th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(3 citation statements)
references
References 47 publications
0
3
0
Order By: Relevance
“…Besides, intelligent creation tools are in great demand as they can help users efficiently create customized dynamic content, e.g., video, animation [27,28]. Some researchers focus on several key steps, such as frame composition [60], shot selection [23,25], shot cut suggestion [35]. Others tackle high-level automatic procedures with simple user interactions and take multiple videos captured by different cameras to produce a coherent video in different application scenarios [1,24,48] using different data sources [5,31,39,54].…”
Section: Related Workmentioning
confidence: 99%
“…Besides, intelligent creation tools are in great demand as they can help users efficiently create customized dynamic content, e.g., video, animation [27,28]. Some researchers focus on several key steps, such as frame composition [60], shot selection [23,25], shot cut suggestion [35]. Others tackle high-level automatic procedures with simple user interactions and take multiple videos captured by different cameras to produce a coherent video in different application scenarios [1,24,48] using different data sources [5,31,39,54].…”
Section: Related Workmentioning
confidence: 99%
“…The size of the "patch" proposed in ViT [12] greatly increases the "receptive field" of the network. There have been many successful Transformer-based applications in computer vision, such as image classification [13][14][15], video detection [16,17], and action recognition [18,19]. Therefore, it is very meaningful to explore Transformer-based dualmodality action recognition methods.…”
Section: Introductionmentioning
confidence: 99%
“…Active speaker detection is a multi-modal task aiming to identify active speakers from a set of candidates in an arbitrary video. This task plays an essential role in speaker diarization [7,41], speaker tracking [27,28], automatic video editing [10,19], and other applications, which has attracted extensive attention from both industry and academia. Figure 1.…”
Section: Introductionmentioning
confidence: 99%