2010
DOI: 10.1109/tmm.2010.2052027
|View full text |Cite
|
Sign up to set email alerts
|

Real-Time Visual Concept Classification

Abstract: As datasets grow increasingly large in content based image and video retrieval, computational efficiency of concept classification is important. This paper reviews techniques to accelerate concept classification, where we show the trade-off between computational efficiency and accuracy. As a basis we use the Bag-of-Words algorithm that in the 2008 benchmarks of TRECVID and PASCAL lead to the best performance scores. We divide the evaluation in three steps: (1) Descriptor Extraction, where we evaluate SIFT, SUR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
87
4
1

Year Published

2012
2012
2019
2019

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 120 publications
(93 citation statements)
references
References 37 publications
(89 reference statements)
1
87
4
1
Order By: Relevance
“…In particular, we use a state-of-the-art Bagof-Visual-Words classification framework which largely follows Uijlings et al [2010] in order to learn the difference between positive and negative paintings.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…In particular, we use a state-of-the-art Bagof-Visual-Words classification framework which largely follows Uijlings et al [2010] in order to learn the difference between positive and negative paintings.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Further, several works have reported that dense sampling, which uniformly selects local 2D image patches or 3D video volumes, can be adopted in place of the expensive sparse keypoint detectors (e.g., DoG [88]) with a competitive recognition performance [101]. Uijlings et al [142] observed that the dense SIFT and dense SURF descriptors can be computed more efficiently with careful implementations that avoid repetitive computations of pixel responses in overlapping regions of nearby image patches.…”
Section: Scalability and Efficiencymentioning
confidence: 99%
“…The classification process of SVM could be slow when nonlinear kernels such as histogram intersection and χ 2 are adopted. Maji et al [81] proposed an interesting idea, with which the histogram intersection and χ 2 kernels can be computed with logarithmic [142] tested this method on video concept detection tasks and observed a satisfying performance in both precision and speed. Recently, Jiang [53] conducted an extensive evaluation of the efficiency of features and classifier kernels in video event recognition.…”
Section: Scalability and Efficiencymentioning
confidence: 99%
“…For this task, we extract a visual vocabulary of 4,096 words. The keypoints are extracted with a dense sampling strategy and described using rgbSIFT features [28]. Descriptors are extracted at two different spatial scales of a spatial pyramidal image representation (entire image and quadrants).…”
Section: Content Descriptionmentioning
confidence: 99%
“…Some of the most efficient approaches use feature points, e.g., Scale Invariant Feature Transform (SIFT) [28], Space-Time Interest Points (STIP) [1], Histogram of oriented Gradients (HoG), 3D-SIFT [2], and Bag-of-Visual-Words representations [3]. These methods are however known to be very computational expensive due to the computation of the visual word dictionaries.…”
Section: Introductionmentioning
confidence: 99%