Recognizing action events from multiple viewpoints

Syeda-Mahmood, Tanveer; Vasilescu, Andrei; Sethi, Shilpa

doi:10.1109/event.2001.938868

Cited by 75 publications

(49 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to this viewpoint change, other factors that make the problem even more challenging are the perspective or affine distortions (depending on the model used), anthropometric variations, or the speed at which the action is performed. Therefore, to make the problem more tractable, various simplifications or restricted special cases have been considered over the years [3,4,5,6,7,8,9]. We aim at alleviating such constraints.…”

Section: Introductionmentioning

confidence: 99%

Cross-View Action Recognition from Temporal Self-similarities

Junejo

Dexter

Laptev

et al. 2008

Lecture Notes in Computer Science

166

141

View full text Add to dashboard Cite

This paper concerns recognition of human actions under view changes. We explore self-similarities of action sequences over time and observe the striking stability of such measures across views. Building upon this key observation we develop an action descriptor that captures the structure of temporal similarities and dissimilarities within an action sequence. Despite this descriptor not being strictly view-invariant, we provide intuition and experimental validation demonstrating the high stability of self-similarities under view changes. Self-similarity descriptors are also shown stable under action variations within a class as well as discriminative for action recognition. Interestingly, self-similarities computed from different image features possess similar properties and can be used in a complementary fashion. Our method is simple and requires neither structure recovery nor multi-view correspondence estimation. Instead, it relies on weak geometric cues captured by self-similarities and combines them with machine learning for efficient cross-view action recognition. The method is validated on three public datasets, it has similar or superior performance compared to related methods and it performs well even in extreme conditions such as when recognizing actions from top views while using side views for training only.Key-words: Action Recognition, Self-Similarity, Sequence Alignment Résumé :Ce document traite de la reconnaissance dactions humaines sous des vues différentes. Nous nous intéressons aux auto-similarités temporelles des actions et observons la stabilité de telles mesures quelle que soit la vue considérée. Nous développons autour de cette constatation un descripteur qui reflète la structure des similarités et dissimilarités temporelles au sein dune action. Bien que ce descripteur ne soit pas strictement invariant aux changements de points de vue, nous proposons une validation intuitive et expérimentale démontrant la grande stabilité des auto-similarités pour des points de vue différents. De plus, ces descripteurs sont stables la variabilité des actions au sein dune même classe et discriminants pour la reconnaissance dactions. Il est intéressant de noter que les autosimilarités calculéesà partir de caractéristiques différentes possèdent les mêmes propriétés et peuventêtre utilisées de manière complémentaire. Notre méthode est simple et ne requiert ni estimation de structures ni mise en correspondances entre vues. Au lieu de cela, elle s'appuie sur les faibles informations géométriques de l'auto-similarité et les combine avec de lapprentissage pour une reconnaissance daction efficace dans un contexte de vues multiples. La méthode aété validée sur trois bases de données, et obtient des performances similaires ou supérieures aux méthodes afférentes. De plus, celle-ci a montré de bonnes performances y compris dans des conditions extrêmes, par exemple lorsque la reconnaissance daction est effectuée pour des vues de dessus alors que la phase dentrainement ne considère que des vues de côtés. Cross-...

show abstract

Section: Introductionmentioning

confidence: 99%

Cross-View Action Recognition from Temporal Self-similarities

Junejo

Dexter

Laptev

et al. 2008

Lecture Notes in Computer Science

166

141

View full text Add to dashboard Cite

show abstract

“…The reconstructions of 3-D plans for events or actions have been explored [103], [136]. We have mostly reviewed event detection from 2-D image sequences.…”

Section: ) Discussionmentioning

confidence: 99%

Event Mining in Multimedia Streams

2008

View full text Add to dashboard Cite

| Events are real-world occurrences that unfold over space and time. Event mining from multimedia streams improves the access and reuse of large media collections, and it has been an active area of research with notable recent progress. This paper contains a survey on the problems and solutions in event mining, approached from three aspects: event description, event-modeling components, and current event mining systems. We present a general characterization of multimedia events, motivated by the maxim of five BW[s and one BH[ for reporting real-world events in journalism: when, where, who, what, why, and how. We discuss the causes for semantic variability in real-world descriptions, including multilevel event semantics, implicit semantics facets, and the influence of context. We discuss five main aspects of an event detection system. These aspects are: the variants of tasks and event definitions that constrain system design, the media capture setup that collectively define the available data and necessary domain assumptions, the feature extraction step that converts the captured data into perceptually significant numeric or symbolic forms, statistical models that map the feature representations to richer semantic descriptions, and applications that use event metadata to help in different information-seeking tasks. We review current event-mining systems in detail, grouping them by the problem formulations and approaches. The review includes detection of events and actions in one or more continuous sequences, events in edited video streams, unsupervised event discovery, events in a collection of media objects, and a discussion on ongoing benchmark activities. These problems span a wide range of multimedia domains such as surveillance, meetings, broadcast news, sports, documentary, and films, as well as personal and online media collections. We conclude this survey with a brief outlook on open research directions.

show abstract

“…For example, Bobick & Davis (2001) proposed to capture the history of shape changes using temporal templates and Weinland et al (2006) extend these 2D templates to 3D action templates. Similarly, based on silhouettes, notions of action cylinders Syeda-Mahmood et al (2001), and space-time shapesYilmaz & Shah (2005a); Gorelick et al (2007) have also been introduced. Recently, researchers have started analyzing video sequences as space-time volumes, built by various local features, such as intensities, gradients, optical flow etc Fathi & Mori (2008); Jhuang et al (2007); Filipovych & Ribeiro (2008).…”

Section: Introductionmentioning

confidence: 99%

Learning Self-Similarities for Action Recognition Using Conditional Random Fields

Junejo¹

2010

Bayesian Network

View full text Add to dashboard Cite

Human action recognition is a complex process due to many factors, such as variation in speeds, postures, camera motions etc. Therefore an extensive amount of research is being undertaken to gracefully solve this problem. To this end, in this paper, we introduce the application of self-similarity surfaces for human action recognition. These surfaces were introduced by Shechtman & Irani (CVPR'07) in the context of matching similarities between images or videos. These surfaces are obtained by matching a small patch, centered at a pixel, to its larger surroundings, aiming to capture similarities of a patch to its neighborhood. Once these surfaces are computed, we propose to transform these surfaces into Histograms of Oriented Gradients (HoG), which are then used to train Conditional Random Fields (CRFs). Our novelty lies in recognizing the utility of these self-similarity surfaces for human action recognition. In addition, in contrast to Shechtman & Irani (CVPR'07), we compute only a few of these surfaces (two per frame) for our task. The proposed method does not rely on the structure recovery nor on the correspondence estimation, but makes only mild assumptions about the rough localization of a person in the frame. We demonstrate good results on a publicly available dataset and show that our results are comparable to other well-known works in this area.

show abstract

Recognizing action events from multiple viewpoints

Cited by 75 publications

References 17 publications

Cross-View Action Recognition from Temporal Self-similarities

Cross-View Action Recognition from Temporal Self-similarities

Event Mining in Multimedia Streams

Learning Self-Similarities for Action Recognition Using Conditional Random Fields

Contact Info

Product

Resources

About