2021 IEEE International Conference on Image Processing (ICIP) 2021
DOI: 10.1109/icip42928.2021.9506088
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Unsupervised Video Summarization Augmented With Dictionary Loss

Abstract: Automated unsupervised video summarization by key-frame extraction consists in identifying representative video frames, best abridging a complete input sequence, and temporally ordering them to form a video summary, without relying on manually constructed ground-truth key-frame sets. Stateof-the-art unsupervised deep neural approaches consider the desired summary to be a subset of the original sequence, composed of video frames that are sufficient to visually reconstruct the entire input. They typically employ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…The research challenge includes the development of corresponding DNN modules that will perform video instance segmentation, in real-time. The state-of-theart approach is to employ Transformer [17] or CNN neural architectures [18] and to exploit adversarial learning strategies to compensate for occlusions or distortions [19] and/or employ novel training goals that augment regular supervised training with unsupervised [20] or adversarial objectives [5], in order to increase accuracy in potential use-cases. Another challenge is to accelerate those algorithms for on-board execution without sacrificing accuracy, e.g., by combining multitask training on auxiliary tasks (such as scene geometry extraction by unsupervised depth map estimation).…”
Section: B Semantic Video Instance Segmentationmentioning
confidence: 99%
“…The research challenge includes the development of corresponding DNN modules that will perform video instance segmentation, in real-time. The state-of-theart approach is to employ Transformer [17] or CNN neural architectures [18] and to exploit adversarial learning strategies to compensate for occlusions or distortions [19] and/or employ novel training goals that augment regular supervised training with unsupervised [20] or adversarial objectives [5], in order to increase accuracy in potential use-cases. Another challenge is to accelerate those algorithms for on-board execution without sacrificing accuracy, e.g., by combining multitask training on auxiliary tasks (such as scene geometry extraction by unsupervised depth map estimation).…”
Section: B Semantic Video Instance Segmentationmentioning
confidence: 99%