2016
DOI: 10.1007/978-3-319-46478-7_47
|View full text |Cite
|
Sign up to set email alerts
|

Video Summarization with Long Short-Term Memory

Abstract: Abstract. We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the task as a structured prediction problem, our main idea is to use Long Short-Term Memory (LSTM) to model the variable-range temporal dependency among video frames, so as to derive both representative and compact video summaries. The proposed model successfully accounts for the sequential structure crucial to generating meaningful video summaries, leading to state-of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
717
1
4

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 554 publications
(724 citation statements)
references
References 38 publications
2
717
1
4
Order By: Relevance
“…Our results are also better than the recent work of [22], where 38.6% is reported using slightly di erent evaluation settings and on par with their method that uses extensive additional labeled data to train interestingness and hyper-parameters (41.8%). Further, the relatively stable average Best F1 for the di erent music-tracks, indicate that the weighting of the components is not music-track speci c per se.…”
Section: Music-guided Summariescontrasting
confidence: 47%
See 1 more Smart Citation
“…Our results are also better than the recent work of [22], where 38.6% is reported using slightly di erent evaluation settings and on par with their method that uses extensive additional labeled data to train interestingness and hyper-parameters (41.8%). Further, the relatively stable average Best F1 for the di erent music-tracks, indicate that the weighting of the components is not music-track speci c per se.…”
Section: Music-guided Summariescontrasting
confidence: 47%
“…Such coherency can be performed by selecting sets of consecutive frames [5] or by adding temporal regularization [11]. The balance between interestingness and coherency, can be obtained using pre-segmentation methods [5], submodular optimization [6,19] or recurrent neural networks [22]. Here, we propose a model that jointly incorporates interestingness, coherency, and which is capable of adjusting the summaries based on a user-provided music-track.…”
Section: Related Workmentioning
confidence: 99%
“…For the SumMe dataset, we compare the proposed method with seven baselines, VSUMM [10], SumTransfer [7], SUM-GAN [13], LSTM [15], MSDS-CC [27], LLR-SDS [28], and Online Motion AE [9] as shown in Table III, from which we can see that both VSUMM [10] and SumTransfer [7] cannot provide satisfactory performance, as employ the handcrafted features instead of more discriminative descriptors and thus cannot provide satisfactory performance. Also, we can find that our approach yields the best performance.…”
Section: Methodsmentioning
confidence: 99%
“…As a whole, eleven baselines are employed here for comparison, including: STIMO [26], VSUMM [10], SumTransfer [7], SUM-GAN [13], SeqDPP [6], LSTM [15], TVSum [8], Li et al [14], MSDS-CC [27], LLR-SDS [28], and Online Motion AE [9]. The results from all of the baselines [26,10,6,7,13,14,15,8,27,28,9] are obtained from those reported in their papers.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
See 1 more Smart Citation