Abstract-This paper addresses recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation, we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this temporal self-similarity descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating its high stability under view changes. Self-similarity descriptors are also shown stable under performance variations within a class of actions, when individual speed fluctuations are ignored. If required, such fluctuations between two different instances of the same action class can be explicitly recovered with dynamic time warping, as will be demonstrated, to achieve cross-view action synchronization. More central to present work, temporal ordering of local selfsimilarity descriptors can simply be ignored within a bag-offeatures type of approach. Sufficient action discrimination is still retained this way to build a view-independent action recognition system. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multi-view correspondence estimation. Instead, it relies on weak geometric properties and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public datasets. It has similar or superior performance compared to related methods and it performs well even in extreme conditions such as when recognizing actions from top views while using side views only for training.
This paper concerns recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating the high stability of self-similarities under view changes. Self-similarity descriptors are also shown stable under action variations within a class as well as discriminative for action recognition. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multi-view correspondence estimation. Instead, it relies on weak geometric cues captured by self-similarities and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public datasets, it has similar or superior performance compared to related methods and it performs well even in extreme conditions such as when recognizing actions from top views while using side views for training only.Key-words: Action Recognition, Self-Similarity, Sequence Alignment Résumé :Ce document traite de la reconnaissance dactions humaines sous des vues différentes. Nous nous intéressons aux auto-similarités temporelles des actions et observons la stabilité de telles mesures quelle que soit la vue considérée. Nous développons autour de cette constatation un descripteur qui reflète la structure des similarités et dissimilarités temporelles au sein dune action. Bien que ce descripteur ne soit pas strictement invariant aux changements de points de vue, nous proposons une validation intuitive et expérimentale démontrant la grande stabilité des auto-similarités pour des points de vue différents. De plus, ces descripteurs sont stables la variabilité des actions au sein dune même classe et discriminants pour la reconnaissance dactions. Il est intéressant de noter que les autosimilarités calculéesà partir de caractéristiques différentes possèdent les mêmes propriétés et peuventêtre utilisées de manière complémentaire. Notre méthode est simple et ne requiert ni estimation de structures ni mise en correspondances entre vues. Au lieu de cela, elle s'appuie sur les faibles informations géométriques de l'auto-similarité et les combine avec de lapprentissage pour une reconnaissance daction efficace dans un contexte de vues multiples. La méthode aété validée sur trois bases de données, et obtient des performances similaires ou supérieures aux méthodes afférentes. De plus, celle-ci a montré de bonnes performances y compris dans des conditions extrêmes, par exemple lorsque la reconnaissance daction est effectuée pour des vues de dessus alors que la phase dentrainement ne considère que des vues de côtés. Cross-...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.