Visual Descriptors in Methods for Video Hyperlinking

Galuščáková, Petra; Batko, Michal; Čech, Jan; Matas, Jiřı́; Novák, David; Pecina, Pavel

doi:10.1145/3078971.3079026

Cited by 2 publications

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast to these use cases, in the domain of endoscopic video no such additional input modalities are available. Galuščáková et al [13] investigate visual descriptors for the task of video hyperlinking within a multi-modal approach, i.e. they use visual (feature signatures, AlexNet fc7 CNN features, concept detection, and face recognition) and text-based (subtitles and automatic transcripts) input modalities.…”

Section: Related Workmentioning

confidence: 99%

Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases

Petscharnig

Schöffmann

2018

Multimed Tools Appl

View full text Add to dashboard Cite

With a rigorous long-term archival of endoscopic surgeries, vast amounts of video and image data accumulate. Surgeons are not able to spend their valuable time to manually search within endoscopic multimedia databases (EMDBs) or manually maintain links to interesting sections in order to quickly retrieve relevant surgery sections. Enabling the surgeons to quickly access the relevant surgery scenes, we utilize the fact that surgeons record external images additionally to the surgery video and aim to link them to the appropriate video sequence in the EMDB using a query-by-example approach. We propose binary Convolutional Neural Network (CNN) features off-the-shelf and compare them to several baselines: pixel-based comparison (PSNR), image structure comparison (SSIM), hand-crafted global features (CEDD and feature signatures), as well as CNN baselines Histograms of Class Confidences (HoCC) and Neural Codes (NC). For evaluation, we use 5.5 h of endoscopic video material and 69 query images selected by medical experts and compare the performance of the aforementioned image mathing methods in terms of video hit rate and distance to the true playback time stamp (PTS) for correct video predictions. Our evaluation shows that binary CNN features are compact, yet powerful image descriptors for retrieval in the endoscopic imaging domain. They are able to maintain state-of-the-art performance, while providing the benefit of low storage space requirements and hence provide the best compromise.

show abstract

Section: Related Workmentioning

confidence: 99%

Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases

Petscharnig

Schöffmann

2018

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…Video hyperlinking systems usually start from a set of anchors that define entry points of interest in collections of long videos and are required to provide, for each anchor, relevant targets within the collection. This task is usually implemented as a two-step process, first starting from a segmentation of the long videos into small segments, then selecting relevant segments for a given anchor [4,5]. This last step is cast as a video retrieval task relying on video segment comparison, where various multimodal solutions have been proposed.…”

Section: Introductionmentioning

confidence: 99%

A Study on Multimodal Video Hyperlinking with Visual Aggregation

Budnik

Demirdelen

Gravier

2018

2018 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

Video hyperlinking offers a way to explore a video collection, making use of links that connect segments having related content. Hyperlinking systems thus seek to automatically create links by connecting given anchor segments to relevant targets within the collection. In this paper, we further investigate multimodal representations of video segments in a hyperlinking system based on bidirectional deep neural networks, which achieved state-of-the-art results in the TRECVid 2016 evaluation. A systematic study of different input representations is done with a focus on the aggregation of the representation of multiple keyframes. This includes, in particular, the use of memory vectors as a novel aggregation technique, which provides a significant improvement over other aggregation methods on the final hyperlinking task. Additionally, the use of metadata is investigated leading to increased performance and lower computational requirements for the system.

show abstract

Visual Descriptors in Methods for Video Hyperlinking

Cited by 2 publications

References 19 publications

Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases

Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases

A Study on Multimodal Video Hyperlinking with Visual Aggregation

Contact Info

Product

Resources

About