Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

Zhang, Jianguo; Marszałek, Marcin; Lazebnik, Svetlana; Schmid, Cordelia

doi:10.1007/s11263-006-9794-4

Cited by 1,708 publications

(1,308 citation statements)

References 52 publications

Supporting

Mentioning

1,255

Contrasting

Unclassified

Order By: Relevance

“…This is an analogue of the human visual pattern recognition system, which is extremely proficient at identifying the damage patterns regardless of the scale and complexity of the scene. In the field of computer vision, various methods have been reported for pattern recognition tasks in various applications, such as object categorization, face recognition, and natural scene classification [12][13][14]. These methods are mostly based on supervised learning approaches, which work well for conventional image classification applications.…”

Section: Introductionmentioning

confidence: 99%

“…However, the overall performance of the learning approach completely depends on the discriminative power of the image descriptors (features) considered for the classification [15]. Generally, images are described through either global (e.g., textures) or local features, like point descriptors such as Scale Invariant Feature Transform (SIFT) [13,16]. However, most global features are very sensitive to scale and clutter [17].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Identification of Structurally Damaged Areas in Airborne Oblique Images Using a Visual-Bag-of-Words Approach

et al. 2016

View full text Add to dashboard Cite

Automatic post-disaster mapping of building damage using remote sensing images is an important and time-critical element of disaster management. The characteristics of remote sensing images available immediately after the disaster are not certain, since they may vary in terms of capturing platform, sensor-view, image scale, and scene complexity. Therefore, a generalized method for damage detection that is impervious to the mentioned image characteristics is desirable. This study aims to develop a method to perform grid-level damage classification of remote sensing images by detecting the damage corresponding to debris, rubble piles, and heavy spalling within a defined grid, regardless of the aforementioned image characteristics. The Visual-Bag-of-Words (BoW) is one of the most widely used and proven frameworks for image classification in the field of computer vision. The framework adopts a kind of feature representation strategy that has been shown to be more efficient for image classification-regardless of the scale and clutter-than conventional global feature representations. In this study supervised models using various radiometric descriptors (histogram of gradient orientations (HoG) and Gabor wavelets) and classifiers (SVM, Random Forests, and Adaboost) were developed for damage classification based on both BoW and conventional global feature representations, and tested with four datasets. Those vary according to the aforementioned image characteristics. The BoW framework outperformed conventional global feature representation approaches in all scenarios (i.e., for all combinations of feature descriptors, classifiers, and datasets), and produced an average accuracy of approximately 90%. Particularly encouraging was an accuracy improvement by 14% (from 77% to 91%) produced by BoW over global representation for the most complex dataset, which was used to test the generalization capability.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Identification of Structurally Damaged Areas in Airborne Oblique Images Using a Visual-Bag-of-Words Approach

et al. 2016

View full text Add to dashboard Cite

show abstract

“…In all cases, the final image representation is based on a spatial pyramid of three levels (1 × 1, 2 × 2, and 3 × 3), yielding a total of 14 cells (Lazebnik et al, 2006). For classification, we use a nonlinear SVM with a χ 2 kernel (Zhang et al, 2007). For classifier fusion we use the addition of different kernel responses since in all our experiments it was shown to provide superior results compared to multiplication of kernels.…”

Section: Coloring Action Classificationmentioning

confidence: 99%

Coloring Action Recognition in Still Images

et al. 2013

View full text Add to dashboard Cite

In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bagof-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images.We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color-shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification.

show abstract

“…In recent years significant progress has been made in the field of object detection and recognition [1]. While standard "scanning-window" methods attempt to localize objects independently, several recent approaches extend this work and exploit scene context as well as relations among objects for improved object detection [2]. Related ideas have been investigated for human motion analysis where incorporating scene-level and behavioral factors effecting the spatial arrangement and movement of people have been shown beneficial for achieving improved detection and tracking accuracy.…”

Section: Introductionmentioning

confidence: 99%

Pedestrian Density Estimation by a Weighted Bag of Visual Words Model

Zhang¹,

Zhang²

2015

IJMLC

View full text Add to dashboard Cite

Abstract-Pedestrian density estimation is very useful and important under transportation environment. In this paper, we present a novel weighting scheme of "bag of visual words model" for pedestrian density estimation, which characterizes both the weight and the relative spatial arrangement aspects of all visual words in depicting an image. We firstly analyze the visual words generation process. By counting the number of images through which each visual word is clustered and computing the cluster radius of each visual word, we can give each visual word a weight. Specially, the co-occurrences of visual words are computed with respect to spatial predicates over a hierarchical spatial partitioning of an image. The representation captures both the absolute and relative spatial arrangement of the words and, through the choice and combination of the predicates, can characterize a variety of spatial relationships. We validate this hypothesis using a challenging ground truth pedestrian dataset. Our approach is shown to result in higher classification accuracy rates than a non-weighting bag-of-visual-words approach. The time used to generate the visual words of our approach is only 1/20 to 1/30 compared to the time of the traditional image feature cluster process.

show abstract

Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

Cited by 1,708 publications

References 52 publications

Identification of Structurally Damaged Areas in Airborne Oblique Images Using a Visual-Bag-of-Words Approach

Identification of Structurally Damaged Areas in Airborne Oblique Images Using a Visual-Bag-of-Words Approach

Coloring Action Recognition in Still Images

Pedestrian Density Estimation by a Weighted Bag of Visual Words Model

Contact Info

Product

Resources

About