Action Parsing-Driven Video Summarization Based on Reinforcement Learning

Lei, Jie; Luan, Qiao; Song, Xinhui; Liu, Xiao; Tao, Dapeng; Song, Mingli

doi:10.1109/tcsvt.2018.2860797

Cited by 69 publications

(39 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, with respect to previously published works in IEEE TCSVT, our manuscript is most closely related to [17], [19], [25], [37] that suggest different deep-learning-based approaches for supervised video summarization. However, differently from them, our manuscript proposes a method that: i) learns summarization in a fully unsupervised manner, and ii) is the first to introduce the integration of a trainable AC model into a GAN to learn a policy for key-fragment selection and summarization.…”

Section: Relation Of the Proposed Methods With The Bibliographymentioning

confidence: 77%

“…[14] uses video metadata for video categorization and to learn what is important in each category, and performs category-driven summarization by maximizing the relevance between the summary and the video's category. [15], [16], [17] similarly learn category-driven summarization in various ways, e.g., by using action classifiers. [18], [19] define a summary by maximizing its relevance with the video metadata, after projecting visual and textual data in a common latent space.…”

Section: A Supervised Video Summarizationmentioning

confidence: 99%

See 1 more Smart Citation

AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

Apostolidis

Adamantidou

Metsai

et al. 2021

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

This paper presents a new method for unsupervised video summarization. The proposed architecture embeds an Actor-Critic model into a Generative Adversarial Network and formulates the selection of important video fragments (that will be used to form the summary) as a sequence generation task. The Actor and the Critic take part in a game that incrementally leads to the selection of the video key-fragments, and their choices at each step of the game result in a set of rewards from the Discriminator. The designed training workflow allows the Actor and Critic to discover a space of actions and automatically learn a policy for key-fragment selection. Moreover, the introduced criterion for choosing the best model after the training ends, enables the automatic selection of proper values for parameters of the training process that are not learned from the data (such as the regularization factor σ). Experimental evaluation on two benchmark datasets (SumMe and TVSum) demonstrates that the proposed AC-SUM-GAN model performs consistently well and gives SoA results in comparison to unsupervised methods, that are also competitive with respect to supervised methods.

show abstract

Section: Relation Of the Proposed Methods With The Bibliographymentioning

confidence: 77%

Section: A Supervised Video Summarizationmentioning

confidence: 99%

AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

Apostolidis

Adamantidou

Metsai

et al. 2021

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…The comparison of the performance of the model on five state-of-the-art works by Gao et al [18], Muhammad et al [14], Muhammad et al [16], Muhammad et al [17], Liu et al [19] and Lei et al [21] based on f1-score are plotted in Figure 6(b). The results prove that the performance of the model surpasses most of the state-of-the-art works given the minimal number of classes trained and complexities involved.…”

Section: Number Of Key Frames Total Number Of Frames In the Video ð23þmentioning

confidence: 99%

An aggregated deep convolutional recurrent model for event based surveillance video summarisation: A supervised approach

Sreeja

Kovoor

2021

IET Computer Vision

View full text Add to dashboard Cite

Surveillance video summarisation is characterised by extracting video segments containing abnormal events from surveillance video footages. Accurate identification of abnormal events from surveillance footages is of paramount importance in surveillance video summarisation. Accordingly, the proposed framework builds an aggregated convolutional recurrent model that can precisely detect the suspicious events in a surveillance footage, by employing a supervised learning which is found to yield better results compared with unsupervised counterparts. The preliminary stage in the model is a multilayer Convolutional Neural Network for frame-level feature extraction followed by stacked bidirectional Gated Recurrent Unit for sequence-level feature extraction and classification. Since the video clips used for training are not implicit to surveillance, a block-based approach for testing on surveillance videos is proposed. The results evaluated on two custom datasets, Streets and Campus, prove that the proposed model produces remarkable results leveraging the properties of bidirectional GRU with supervised learning. Extensive experimental analysis on selection of optimum architecture is conducted which substantiates the significance of stacked bidirectional GRUs over unidirectional ones. Additionally, qualitative results ensure that summaries produced are concise, representative, complete, diverse and informative. Moreover, comparison of the performance of the proposed model with state of the art certainly proves the superiority of the proposed model.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

show abstract

“…This differs from prior work, which perform manually partitioning of the video into segments of the same length and do thereby not take the video content into account. Lei et al [26] propose another reinforcement learning-based summarization approach that also dynamically segments videos. The video is first segmented using a trained action classifier, so that each clip contains a single action, then a deep recurrent neural network is applied to select the most distinct frames for each clip.…”

Section: Video Summarization Using Reinforcement Learningmentioning

confidence: 99%

“…Reinforcement learning is used in this work to guide our summarization agent using a set of rewards that encode our underlying intuition of what qualities a successful summarization result should have. Deep reinforcement learning approaches [19][20][21] have been extensively used in a variety of computer vision tasks such as object segmentation [22], video captioning [23], action recognition [24], and also generic video summarization [25][26][27][28]. For example, Zhou and Qiao [25] develop a deep reinforcement learning-based summarization network with a diversity-representativeness reward to generate summaries, and achieve a good performance on generic video summarization.…”

Section: Introductionmentioning

confidence: 99%

Deep Reinforcement Learning for Query-Conditioned Video Summarization

et al. 2019

View full text Add to dashboard Cite

Query-conditioned video summarization requires to (1) find a diverse set of video shots/frames that are representative for the whole video, and that (2) the selected shots/frames are related to a given query. Thus it can be tailored to different user interests leading to a better personalized summary and differs from the generic video summarization which only focuses on video content. Our work targets this query-conditioned video summarization task, by first proposing a Mapping Network (MapNet) in order to express how related a shot is to a given query. MapNet helps establish the relation between the two different modalities (videos and query), which allows mapping of visual information to query space. After that, a deep reinforcement learning-based summarization network (SummNet) is developed to provide personalized summaries by integrating relatedness, representativeness and diversity rewards. These rewards jointly guide the agent to select the most representative and diversity video shots that are most related to the user query. Experimental results on a query-conditioned video summarization benchmark demonstrate the effectiveness of our proposed method, indicating the usefulness of the proposed mapping mechanism as well as the reinforcement learning approach.

show abstract

Action Parsing-Driven Video Summarization Based on Reinforcement Learning

Cited by 69 publications

References 27 publications

AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

An aggregated deep convolutional recurrent model for event based surveillance video summarisation: A supervised approach

Deep Reinforcement Learning for Query-Conditioned Video Summarization

Contact Info

Product

Resources

About