Designing effective features is a fundamental problem in computer vision. However, it is usually difficult to achieve a great tradeoff between discriminative power and robustness. Previous works shown that spatial co-occurrence can boost the discriminative power of features. However the current existing co-occurrence features are taking few considerations to the robustness and hence suffering from sensitivity to geometric and photometric variations. In this work, we study the Transform Invariance (TI) of co-occurrence features. Concretely we formally introduce a Pairwise Transform Invariance (PTI) principle, and then propose a novel Pairwise Rotation Invariant Co-occurrence Local Binary Pattern (PRICoLBP) feature, and further extend it to incorporate multi-scale, multi-orientation, and multi-channel information. Different from other LBP variants, PRICoLBP can not only capture the spatial context co-occurrence information effectively, but also possess rotation invariance. We evaluate PRICoLBP comprehensively on nine benchmark data sets from five different perspectives, e.g., encoding strategy, rotation invariance, the number of templates, speed, and discriminative power compared to other LBP variants. Furthermore we apply PRICoLBP to six different but related applications-texture, material, flower, leaf, food, and scene classification, and demonstrate that PRICoLBP is efficient, effective, and of a well-balanced tradeoff between the discriminative power and robustness.
Dynamic texture and scene classification by transferring deep image features, Neurocomputing, http://dx.Abstract Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempt to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific,we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet. We evaluate systematically the proposed spatial TCoF and the temporal TCoF schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, and demonstrate that the proposed approach yields superior performance.
Subspace clustering methods based on data selfexpression have become very popular for learning from data that lie in a union of low-dimensional linear subspaces. However, the applicability of subspace clustering has been limited because practical visual data in raw form do not necessarily lie in such linear subspaces. On the other hand, while Convolutional Neural Network (ConvNet) has been demonstrated to be a powerful tool for extracting discriminative features from visual data, training such a ConvNet usually requires a large amount of labeled data, which are unavailable in subspace clustering applications. To achieve simultaneous feature learning and subspace clustering, we propose an end-to-end trainable framework, called Self-Supervised Convolutional Subspace Clustering Network (S 2 ConvSCN), that combines a ConvNet module (for feature learning), a self-expression module (for subspace clustering) and a spectral clustering module (for selfsupervision) into a joint optimization framework. Particularly, we introduce a dual self-supervision that exploits the output of spectral clustering to supervise the training of the feature learning module (via a classification loss) and the self-expression module (via a spectral clustering loss). Our experiments on four benchmark datasets show the effectiveness of the dual self-supervision and demonstrate superior performance of our proposed approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.