In this paper, we address the challenging problem of categorizing video sequences composed of dynamic natural scenes. Contrarily to previous methods that rely on handcrafted descriptors, we propose here to represent videos using unsupervised learning of motion features. Our method encompasses three main contributions: 1) Based on the Slow Feature Analysis principle, we introduce a learned local motion descriptor which represents the principal and more stable motion components of training videos. 2) We integrate our local motion feature into a global coding/pooling architecture in order to provide an effective signature for each video sequence. 3) We report state of the art classification performances on two challenging natural scenes data sets. In particular, an outstanding improvement of 11% in classification score is reached on a data set introduced in 2012.
Abstract-This paper presents an extension of the HMAX model: a neural network model for image classification. The HMAX model can be described as a four-level architecture with a first level consisting of multi-scale and multi-orientation local filters. We introduce two main contributions to this model. First, we improve the way the local filters at the first level are integrated into more complex filters at the last level, providing a flexible description of object regions, combining local information of multiple scales and orientations. These new filters are discriminative and yet invariant, two key aspects of visual classification. We evaluate their discriminative power and their level of invariance to geometrical transformations on a synthetic image set. Second, we introduce a multi-resolution spatial pooling. This pooling encodes both local and global spatial information to produce discriminative image signatures. Classification results are reported on three image data sets, Caltech101, Caltech256 and Fifteen Scenes. We show significant improvements over previous architectures using a similar framework.
Learned Categorical Perception (CP) occurs when the members of different categories come to look more dissimilar (“between-category separation”) and/or members of the same category come to look more similar (“within-category compression”) after a new category has been learned. To measure learned CP and its physiological correlates we compared dissimilarity judgments and Event Related Potentials (ERPs) before and after learning to sort multi-featured visual textures into two categories by trial and error with corrective feedback. With the same number of training trials and feedback, about half the subjects succeeded in learning the categories (“Learners”: criterion 80% accuracy) and the rest did not (“Non-Learners”). At both lower and higher levels of difficulty, successful Learners showed significant between-category separation—and, to a lesser extent, within-category compression—in pairwise dissimilarity judgments after learning, compared to before; their late parietal ERP positivity (LPC, usually interpreted as decisional) also increased and their occipital N1 amplitude (usually interpreted as perceptual) decreased. LPC amplitude increased with response accuracy and N1 amplitude decreased with between-category separation for the Learners. Non-Learners showed no significant changes in dissimilarity judgments, LPC or N1, within or between categories. This is behavioral and physiological evidence that category learning can alter perception. We sketch a neural net model predictive of this effect.
This paper presents an improvement on a biologically inspired network for image classification. Previous models have used a multiscale and multi-orientation architecture to gain robustness to transformations and to extract complex visual features. Our contribution to this type of architecture resides in the building of complex visual features which are better tuned to images structures. We allow the network to build complex features with richer information in terms of the local scales of image structures. Our classification results show significant improvements over previous architectures using the same framework.
Learning to categorize requires distinguishing category members from non-members by detecting the features that covary with membership. Human subjects were trained to sort visual textures into two categories by trial and error with corrective feedback. Difficulty levels were increased by decreasing the proportion of covariant features. Pairwise similarity judgments were tested before and after category learning. Three effects were observed: (1) The lower the proportion of covariant features, the more trials it took to learn the category and the fewer the subjects who succeeded in learning it. After training, (2) perceived pairwise distance increased between categories and, to a lesser extent, (3) decreased within categories, at all levels of difficulty, but only for successful learners. This perceived between-category separation and within-category compression is called categorical perception (CP). A very simple neural network model for category learning using uniform binary (0/1) features showed similar CP effects. CP may occur because learning to selectively detect covariant features and ignore non-covariant features reduces the dimensionality of perceived similarity space. In addition to (1) – (3), the nets showed (4) a strong negative correlation between the proportion of covariant features and the size of the CP effect. This correlation was not evident in the human subjects, probably because, unlike the formal binary features of the input to the nets, which were all uniform, the visual features of the human inputs varied in difficulty.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.