Semantic Similarity Based Video Retrieval

Jung, Minsoo; Park, Sung Han

doi:10.1007/978-3-642-02937-0_35

Cited by 3 publications

(5 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, we focus on more challenging unconstrained web data where leveraging multiple modalities and larger concept banks is important to build a robust system. While [2,16] both use a pre-defined concept ontology, we demonstrate the benefit of training indomain detectors in a data driven manner by discovering concepts from free form text descriptions.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts

Bondugula

Luisier

et al. 2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

103

157

View full text Add to dashboard Cite

Current state-of-the-art systems for visual content analysis require large training sets for each class of interest, and performance degrades rapidly with fewer examples. In this paper, we present a general framework for the zeroshot learning problem of performing high-level event detection with no training exemplars, using only textual descriptions. This task goes beyond the traditional zero-shot framework of adapting a given set of classes with training data to unseen classes. We leverage video and image collections with free-form text descriptions from widely available web sources to learn a large bank of concepts, in addition to using several off-the-shelf concept detectors, speech, and video text for representing videos. We utilize natural language processing technologies to generate event description features. The extracted features are then projected to a common high-dimensional space using text expansion, and similarity is computed in this space. We present extensive experimental results on the large TRECVID MED [26] corpus to demonstrate our approach. Our results show that the proposed concept detection methods significantly outperform current attribute classifiers such as Classemes [34], ObjectBank [21], and SUN attributes [28]. Further, we find that fusion, both within as well as between modalities, is crucial for optimal performance.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Video retrieval using semantic similarity has previously been explored in [2,16]. However, these approaches focus on highly structured broadcast data, where a small 374 concept pool [2] can be adequate.…”

Section: Related Workmentioning

confidence: 99%

Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts

Bondugula

Luisier

et al. 2014

2014 IEEE Conference on Computer Vision and Pattern Recognition

103

157

View full text Add to dashboard Cite

show abstract

“…In order to make this assembly simple and intuitive, we ensure that all of the attributes in our attribute vocabulary A have semantic meaning. This differentiates our approach significantly from others [1,19,9,46], which use Histogram of Gradient (HOG) or spatio-temporal interest point (STIP) features to describe their actions. While it is easy for a user to create a graph out of cars, colors, or people, it requires significant expertise to create a meaningful assembly out of HOG or STIP features.…”

Section: Semantic Graph Querymentioning

confidence: 99%

“…There are a number of recent works for video classification and retrieval [1,19,9,46,49] based on video semantic similarity with no prior knowledge or training data. Many of these works are based on exploiting and leveraging multiple modalities such as text, audio and OCR in combination with video.…”

Section: Related Workmentioning

confidence: 99%

Efficient Activity Retrieval through Semantic Graph Queries

Castañón

Chen

Zhang

et al. 2015

Proceedings of the 23rd ACM International Conference on Multimedia

View full text Add to dashboard Cite

We present an efficient retrieval approach for activity detection in large surveillance video datasets based on semantic graph queries. Unlike conventional approaches, our zero-shot retrieval method does not require knowledge of the activity classes contained in the video. We propose a novel user-centric approach that models queries through the creation of sparse semantic graphs based on attributes and discriminative relationships. We then pose search as a ranked subgraph matching problem and leverage the fact that the attributes and relationships in the query have different levels of discriminability to filter out bad matches. Rather than solving the NP-hard exact subgraph matching problem, we develop a novel maximally discriminative spanning tree (MDST) as the relaxation of a given query graph, and then describe a matching algorithm that recovers matches to this tree in linear time using maximally discriminative subgraph matching (MDSM). We utilize the MDST to minimize the number of possible matches to the original query while guaranteeing that the best matches are within this set. We test this algorithm on two large video datasets: the 35-GB Virat Ground dataset and a 1-TB aerial data collection from Yuma. These datasets yield graphs with 200,000 nodes and 1 million nodes, respectively, with an average degree of 5. Our approach finds complex, large-scale queries in seconds while maintaining comparable precision and recall to slower current approaches.

show abstract

“…Jung and S.H. Park [4] proposed a measure to overcome semantic gap in video retrieval. A. Anjulan and N. Canagarajah [5] described a unified framework for object retrieval and mining.…”

Section: Introductionmentioning

confidence: 99%

Video Object Retrieval Based on Color Feature Modeling

Huang

Cai

et al. 2010

2010 International Conference on Machine Vision and Human-Machine Interface

View full text Add to dashboard Cite

Retrieve of the objects in the videos is a challenging and important work. Non-textual object information stored in videos is presented as grids of numbers in the image flames. Therefore, it is hard to retrieve the object in the videos with classical methods. In this paper, a robust color-feature model of the moving video objects is proposed by converting the RGB pixels to a color circle of hue. Furthermore, we give a framework of video object retrieval based on the color feature model. Finally, the experimental results indicate that the proposed model is accurate and similar to human recognition of the moving objects in videos view, which demonstrates the good performance of the color-feature model.

show abstract

Semantic Similarity Based Video Retrieval

Cited by 3 publications

References 9 publications

Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts

Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts

Efficient Activity Retrieval through Semantic Graph Queries

Video Object Retrieval Based on Color Feature Modeling

Contact Info

Product

Resources

About