Abstract. We introduce a new framework for image classification that extends beyond the window sampling of fixed spatial pyramids to include a comprehensive set of windows densely sampled over location, size and aspect ratio. To effectively deal with this large set of windows, we derive a concise high-level image feature using a two-level extraction method. At the first level, window-based features are computed from local descriptors (e.g., SIFT, spatial HOG, LBP) in a process similar to standard feature extractors. Then at the second level, the new image feature is determined from the window-based features in a manner analogous to the first level. This higher level of abstraction offers both efficient handling of dense samples and reduced sensitivity to misalignment. More importantly, our simple yet effective framework can readily accommodate a large number of existing pooling/coding methods, allowing them to extract features beyond the spatial pyramid representation. To effectively fuse the second level feature with a standard first level image feature for classification, we additionally propose a new learning algorithm, called Generalized Adaptive p-norm Multiple Kernel Learning (GA-MKL), to learn an adapted robust classifier based on multiple base kernels constructed from image features and multiple sets of prelearned classifiers of all the classes. Extensive evaluation on the object recognition (Caltech256) and scene recognition (15Scenes) benchmark datasets demonstrates that the proposed method outperforms state-ofthe-art image classification algorithms under a broad range of settings.
Abstract-As a large-scale database of hundreds of thousands of face images collected from the Internet and digital cameras becomes available, how to utilize it to train a well-performed face detector is a quite challenging problem. In this paper, we propose a method to resample a representative training set from a collected large-scale database to train a robust human face detector. First, in a high-dimensional space, we estimate geodesic distances between pairs of face samples/examples inside the collected face set by isometric feature mapping (Isomap) and then subsample the face set. After that, we embed the face set to a low-dimensional manifold space and obtain the low-dimensional embedding. Subsequently, in the embedding, we interweave the face set based on the weights computed by locally linear embedding (LLE). Furthermore, we resample nonfaces by Isomap and LLE likewise. Using the resulting face and nonface samples, we train an AdaBoost-based face detector and run it on a large database to collect false alarms. We then use the false detections to train a one-class support vector machine (SVM). Combining the AdaBoost and one-class SVM-based face detector, we obtain a stronger detector. The experimental results on the MIT + CMU frontal face test set demonstrated that the proposed method significantly outperforms the other state-of-the-art methods.
Aiming at the problem when both positive and negative training set are enormous, this paper proposes a novel Matrix-Structural Learning (MSL) method, as an extension to Viola and Jones' cascade learning method for object detection. Briefly speaking, unlike Viola and Jones' method that learn linearly by bootstrapping only negative samples, the proposed MSL method bootstraps both positive and negative samples in a matrix-like structure. Moreover, an accumulative way is further presented to improve the training efficiency of MSL by inheriting features learned previously during training procedure. The proposed method is evaluated on face detection problem. On a positive set containing 230,000 face samples, only 12 hours are needed on a common PC with a 3.20GHz Pentium IV processor to learn a classifier with false alarm rate less than 1/1,000,000. What's more, the accuracy of the learned detector exceeds the state-of-the-art results on the CMU+MIT frontal face test set.
We present a framework for image classification that extends beyond the window sampling of fixed spatial pyramids and is supported by a new learning algorithm. Based on the observation that fixed spatial pyramids sample a rather limited subset of the possible image windows, we propose a method that accounts for a comprehensive set of windows densely sampled over location, size, and aspect ratio. A concise high-level image feature is derived to effectively deal with this large set of windows, and this higher level of abstraction offers both efficient handling of the dense samples and reduced sensitivity to misalignment. In addition to dense window sampling, we introduce generalized adaptive l(p)-norm multiple kernel learning (GA-MKL) to learn a robust classifier based on multiple base kernels constructed from the new image features and multiple sets of prelearned classifiers from other classes. With GA-MKL, multiple levels of image features are effectively fused, and information is shared among different classifiers. Extensive evaluation on benchmark datasets for object recognition (Caltech256 and Caltech101) and scene recognition (15Scenes) demonstrate that the proposed method outperforms the state-of-the-art under a broad range of settings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.