2021
DOI: 10.1109/tpami.2020.2983929
|View full text |Cite
|
Sign up to set email alerts
|

A Sparse Sampling-Based Framework for Semantic Fast-Forward of First-Person Videos

Abstract: Technological advances in sensors have paved the way for digital cameras to become increasingly ubiquitous, which, in turn, led to the popularity of the self-recording culture. As a result, the amount of visual data on the Internet is moving in the opposite direction of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched stashed away in some computer folder or website. In this paper, we address the problem of creating smooth fast-forward vide… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 33 publications
0
7
0
Order By: Relevance
“…the Microsoft Hyperlapse (MSH) [Joshi et al, 2015], and the extended version of the Sparse Adaptive Sampling (SASv2) [Silva et al, 2021]. We used the desktop version of the MSH.…”
Section: Comparison With Hyperlapse Creation Methodsmentioning
confidence: 99%
“…the Microsoft Hyperlapse (MSH) [Joshi et al, 2015], and the extended version of the Sparse Adaptive Sampling (SASv2) [Silva et al, 2021]. We used the desktop version of the MSH.…”
Section: Comparison With Hyperlapse Creation Methodsmentioning
confidence: 99%
“…Silva et al [12] proposed modeling the frame sampling in semantic fast-forwarding as a Minimum Sparse Reconstruction problem. The authors also propose an extension [15] that aims to remove visual gaps that could break the continuity of the output video, and to smooth the speed-up transitions between video segments. A drawback of the aforementioned works consists of pre-processing steps like detecting objects and computing optical flow, which is timeconsuming and rely on the accuracy of third-party methods.…”
Section: Semantic Fast-forwardingmentioning
confidence: 99%
“…In our previous approach [14], we built an end-to-end trainable embedding space for text and image, which is further used by an RL agent to accelerate an input video. However, unlike most of the other fast-forward methods [5], [6], [8], [11], [12], [15], [40], [41], [42], the agent cannot optimize the output speed-up rate of the video, which is essential in several applications. In this work, we extend our previous approach [14] by introducing the Skip-Aware Fast-Forward Agent (SAFFA) and an Extended Visually-guided Document Attention Network (VDAN+).…”
Section: Cross-modal Embedding For Instructional Videosmentioning
confidence: 99%
See 2 more Smart Citations