Content-based video genre classification using multiple cues

Ekenel, Hazım Kemal; Semela, Tomas; Stiefelhagen, Rainer

doi:10.1145/1877850.1877858

Cited by 31 publications

(42 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the constant need to improve online video search, interesting research [6], [8], [12], [17], [23], [29] have been pursued that address shot classification from multiple perspectives: low-level textures, intensity, high-level objects and scenes etc. While these are meaningful at content level, they are unable to capture the ambient camera motion which replicates the narrative human eye and hence are far more semantically challenging.…”

Section: Fig 1 a Schematic Diagram Showing The Various Processes Inmentioning

confidence: 99%

“…These include: content based video search [12], film genre classification [8], [23] and video Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/NBC, or the U.S. Government.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Classification of Cinematographic Shots Using Lie Algebra and its Application to Complex Event Recognition

Bhattacharya

Mehran

Sukthankar

et al. 2014

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Abstract-In this paper, we propose a discriminative representation of a video shot based on its camera motion and demonstrate how the representation can be used for high level multimedia tasks like complex event recognition. In our technique, we assume that a homography exists between a pair of subsequent frames in a given shot. Using purely image-based methods, we compute homography parameters that serve as coarse indicators of the ambient camera motion. Next, using Lie algebra, we map the homography matrices to an intermediate vector space that preserves the intrinsic geometric structure of the transformation. The mappings are stacked temporally to generate vector time-series per shot. To extract meaningful features from time-series, we propose an efficient linear dynamical system based technique. The extracted temporal features are further used to train linear SVMs as classifiers for a particular shot class. In addition to demonstrating the efficacy of our method on a novel dataset, we extend its applicability to recognize complex events in large scale videos under unconstrained scenarios. Our empirical evaluations on eight cinematographic shot classes show that our technique performs close to approaches that involve extraction of 3-D trajectories using computationally prohibitive structure from motion techniques.

show abstract

Section: Fig 1 a Schematic Diagram Showing The Various Processes Inmentioning

confidence: 99%

mentioning

confidence: 99%

Classification of Cinematographic Shots Using Lie Algebra and its Application to Complex Event Recognition

Bhattacharya

Mehran

Sukthankar

et al. 2014

IEEE Trans. Multimedia

View full text Add to dashboard Cite

show abstract

“…Much research work has attempted to automatically classify an entire video clip into one of several categories, such as sports, news, cartoon, music. In general, the previous methods can be categorized into four types: text-based approaches [1,19], audio feature based approaches [11,12,13,14], visual feature based approaches [5,16,18], and those that used some combination of text, audio and visual features [4,5,8]. In fact, most authors incorporated audio and visual features into their approaches (we call it contentbased approaches); therefore in general, most approaches employ more than one modality.…”

Section: Video That Is Recorded By An Amateur Without Any Professionamentioning

confidence: 99%

“…Some approaches combined all features into a single feature vector while others trained classifiers for each modality and then used another classifier for making the final decision. In [4], beside audio features, visual features including colour and texture descriptors were used. R. Glasberg et al also used a motion activity descriptor and shot transition descriptor in [5].…”

Section: Video That Is Recorded By An Amateur Without Any Professionamentioning

confidence: 99%

Short user-generated videos classification using accompanied audio categories

Guo

Gurrin

2012

Proceedings of the 2012 ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis

View full text Add to dashboard Cite

This paper investigates the classification of short user-genera ted videos (UGVs) using the accompanied audio data since short UGVs accounts for a great proportion of the Internet UGVs and many short UGVs are accompanied by singlecategory soundtracks. We define seven types of UGVs corresponding to seven audio categories respectively. We also investigate three modeling approaches for audio feature representation, namely, single Gaussian (1G), Gaussian mixture (GMM) and Bag-of-Audio-Word (BoAW) models. Then using Support Vector Machine (SVM) with three different distance measurements corresponding to three feature representations, classifiers are trained to categorize the UGVs. The accompanying evaluation results show that these approaches are effective for categorizing the short UGVs based on their audio track. Experimental results show that a GMM representation with approximated Bhattacharyya distance (ABD) measurement produces the best performance, and BoAW representation with χ 2 kernel also reports comparable results.

show abstract

“…Gross image features such as motion and color were used to classify video genre, along with a decision tree classifier [18] concentrated on background or camera motion and the foreground object motion using Gaussian Mixture Model (GMM) as the classifier [16]. Ekenel et al addressed the problem of video genre classification for five classes with a set of visual features, with SVM used for classification [4]. They used temporal and spatial information to build an HMM classifier.…”

Section: Related Workmentioning

confidence: 99%