Person independent and pose invariant estimation of facial expressions and action unit (AU) intensity estimation is important for situation analysis and for automated video annotation. We evaluated raw 2D shape data of the CK+ database, used Procrustes transformation and the multi-class SVM leave-one-out method for classification. We found close to 100% performance demonstrating the relevance and the strength of details of the shape. Precise 3D shape information was computed by means of Constrained Local Models (CLM) on video sequences. Such sequences offer the opportunity to compute a time-averaged '3D Personal Mean Shape' (PMS) from the estimated CLM shapes, which-upon subtraction-gives rise to person independent emotion estimation. On CK+ data PMS showed significant improvements over AU0 normalization; performance reached and sometimes surpassed state-ofthe-art results on emotion classification and on AU intensity estimation. 3D PMS from 3D CLM offers pose invariant emotion estimation that we studied by rendering a 3D emotional database for different poses and different subjects from the BU 4DFE database. Frontal shapes derived from CLM fits of the 3D shape were evaluated. Results demonstrate that shape estimation alone can be used for robust, high quality pose invariant emotion classification and AU intensity estimation.
Sensory representations are not only sparse, but often overcomplete: coding units significantly outnumber the input units. For models of neural coding this overcompleteness poses a computational challenge for shaping the signal processing channels as well as for using the large and sparse representations in an efficient way. We argue that higher level overcompleteness becomes computationally tractable by imposing sparsity on synaptic activity and we also show that such structural sparsity can be facilitated by statistics based decomposition of the stimuli into typical and atypical parts prior to sparse coding. Typical parts represent large-scale correlations, thus they can be significantly compressed. Atypical parts, on the other hand, represent local features and are the subjects of actual sparse coding. When applied on natural images, our decomposition based sparse coding model can efficiently form overcomplete codes and both center-surround and oriented filters are obtained similar to those observed in the retina and the primary visual cortex, respectively. Therefore we hypothesize that the proposed computational architecture can be seen as a coherent functional model of the first stages of sensory coding in early vision.
Sparse coding algorithms are about finding a linear basis in which signals can be represented by a small number of active (non-zero) coefficients. Such coding has many applications in science and engineering and is believed to play an important role in neural information processing. However, due to the computational complexity of the task, only approximate solutions provide the required efficiency (in terms of time). As new results show, under particular conditions there exist efficient solutions by minimizing the magnitude of the coefficients ('l 1 -norm') instead of minimizing the size of the active subset of features ('l 0 -norm'). Straightforward neural implementation of these solutions is not likely, as they require a priori knowledge of the number of active features. Furthermore, these methods utilize iterative re-evaluation of the reconstruction error, which in turn implies that final sparse forms (featuring 'population sparseness') can only be reached through the formation of a series of non-sparse representations, which is in contrast with the overall sparse functioning of the neural systems ('lifetime sparseness'). In this article we present a novel algorithm which integrates our previous 'l 0 -norm' model on spike based probabilistic optimization for sparse coding with ideas coming from novel 'l 1 -norm' solutions.The resulting algorithm allows neurally plausible implementation and does not require an exactly defined sparseness level thus it is suitable for representing natural stimuli with a varying number of features. We also demonstrate that the combined method significantly extends the domain where optimal solutions can be found by 'l 1 -norm' based algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.