Real-Time Visual Concept Classification

Uijlings, Jasper; Smeulders, A.W.M.; Scha, Remko

doi:10.1109/tmm.2010.2052027

Cited by 120 publications

(93 citation statements)

References 37 publications

(89 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In particular, we use a state-of-the-art Bagof-Visual-Words classification framework which largely follows Uijlings et al [2010] in order to learn the difference between positive and negative paintings.…”

Section: Proposed Methodsmentioning

confidence: 99%

Affective Analysis of Abstract Paintings Using Statistical Analysis and Art Theory

Sartori

2014

Proceedings of the 16th International Conference on Multimodal Interaction

View full text Add to dashboard Cite

Section: Proposed Methodsmentioning

confidence: 99%

Affective Analysis of Abstract Paintings Using Statistical Analysis and Art Theory

Sartori

2014

Proceedings of the 16th International Conference on Multimodal Interaction

View full text Add to dashboard Cite

“…Further, several works have reported that dense sampling, which uniformly selects local 2D image patches or 3D video volumes, can be adopted in place of the expensive sparse keypoint detectors (e.g., DoG [88]) with a competitive recognition performance [101]. Uijlings et al [142] observed that the dense SIFT and dense SURF descriptors can be computed more efficiently with careful implementations that avoid repetitive computations of pixel responses in overlapping regions of nearby image patches.…”

Section: Scalability and Efficiencymentioning

confidence: 99%

“…The classification process of SVM could be slow when nonlinear kernels such as histogram intersection and χ 2 are adopted. Maji et al [81] proposed an interesting idea, with which the histogram intersection and χ 2 kernels can be computed with logarithmic [142] tested this method on video concept detection tasks and observed a satisfying performance in both precision and speed. Recently, Jiang [53] conducted an extensive evaluation of the efficiency of features and classifier kernels in video event recognition.…”

Section: Scalability and Efficiencymentioning

confidence: 99%

High-level event recognition in unconstrained videos

Jiang

Bhattacharya

Chang

et al. 2012

Int J Multimed Info Retr

161

108

View full text Add to dashboard Cite

The goal of high-level event recognition is to automatically detect complex high-level events in a given video sequence. This is a difficult task especially when videos are captured under unconstrained conditions by nonprofessionals. Such videos depicting complex events have limited quality control, and therefore, may include severe camera motion, poor lighting, heavy background clutter, and occlusion. However, due to the fast growing popularity of such videos, especially on the Web, solutions to this problem are in high demands and have attracted great interest from researchers. In this paper, we review current technologies for complex event recognition in unconstrained videos. While the existing solutions vary, we identify common key modules and provide detailed descriptions along with some insights for each of them, including extraction and representation of low-level features across different modalities, classification strategies, fusion techniques, etc. Publicly available benchmark datasets, performance metrics, and related research forums are also described. Finally, we discuss promising directions for future research.

show abstract

“…For this task, we extract a visual vocabulary of 4,096 words. The keypoints are extracted with a dense sampling strategy and described using rgbSIFT features [28]. Descriptors are extracted at two different spatial scales of a spatial pyramidal image representation (entire image and quadrants).…”

Section: Content Descriptionmentioning

confidence: 99%

“…Some of the most efficient approaches use feature points, e.g., Scale Invariant Feature Transform (SIFT) [28], Space-Time Interest Points (STIP) [1], Histogram of oriented Gradients (HoG), 3D-SIFT [2], and Bag-of-Visual-Words representations [3]. These methods are however known to be very computational expensive due to the computation of the visual word dictionaries.…”

Section: Introductionmentioning

confidence: 99%

An in-depth evaluation of multimodal video genre categorization

Mironica

Ionescu

Knees

et al. 2013

2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI)

View full text Add to dashboard Cite

Abstract-In this paper we propose an in-depth evaluation of the performance of video descriptors to multimodal video genre categorization. We discuss the perspective of designing appropriate late fusion techniques that would enable to attain very high categorization accuracy, close to the one achieved with user-based text information. Evaluation is carried out in the context of the 2012 Video Genre Tagging Task of the MediaEval Benchmarking Initiative for Multimedia Evaluation, using a data set of up to 15.000 videos (3,200 hours of footage) and 26 video genre categories specific to web media. Results show that the proposed approach significantly improves genre categorization performance, outperforming other existing approaches. The main contribution of this paper is in the experimental part, several valuable interesting findings are reported that motivate further research on video genre classification.

show abstract

Real-Time Visual Concept Classification

Cited by 120 publications

References 37 publications

Affective Analysis of Abstract Paintings Using Statistical Analysis and Art Theory

Affective Analysis of Abstract Paintings Using Statistical Analysis and Art Theory

High-level event recognition in unconstrained videos

An in-depth evaluation of multimodal video genre categorization

Contact Info

Product

Resources

About