Video summarization using event‐related potential responses to shot boundaries in real‐time video watching

Kim, Hyun Hee; Kim, Yong Ho

doi:10.1002/asi.24103

Cited by 3 publications

(1 citation statement)

References 43 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This outcome into a precision of 62.6% which is higher when segregated and VSUMM (54.4%), lacking word reference methodologies (SD) (48.3%), and Key Point-Based Key-frame Selection (KBKS) (46%). Comparative models are depicted in the studies [13][14][15][16], wherein Event-Related Potential Responses, network video overview utilizing super pixels, Joint Integer Linear Programming (JILP), and cautious video summary improvements are portrayed. Out of these, the JILP approach beats different procedures, by giving an exactness of 59.9% across different datasets.…”

Section: Literature Review Deep Learning-based Video Summarizationmentioning

confidence: 99%

Gated Recurrent Units and Recurrent Neural Network Based Multimodal Approach for Automatic Video Summarization

Kaur¹,

Aljrees²,

Kumar³

et al. 2023

View full text Add to dashboard Cite

A typical video record aggregation system requires the concurrent performance of a large number of image processing tasks, including but not limited to image acquisition, pre-processing, segmentation, feature extraction, verification, and description. These tasks must be executed with utmost precision to ensure smooth system performance. Among these tasks, feature extraction and selection are the most critical. Feature extraction involves converting the large-scale image data into smaller mathematical vectors, and this process requires great skill. Various feature extraction models are available, including wavelet, cosine, Fourier, histogram-based, and edge-based models. The key objective of any feature extraction model is to represent the image data with minimal attributes and no loss of information. In this study, we propose a novel feature-variance model that detects differences in video features and generates feature-reduced video frames. These frames are then fed into a GRU-based RNN model, which classifies them as either keyframes or non-keyframes. Keyframes are then extracted to create a summarized video, while non-keyframes are reduced. Various key-frame extraction models are also discussed in this section, followed by a detailed analysis of the proposed summarization model and its results. Finally, we present some interesting observations about the proposed model and suggest ways to improve it.

show abstract

Section: Literature Review Deep Learning-based Video Summarizationmentioning

confidence: 99%