Cross-View Action Recognition from Temporal Self-similarities

Junejo, Imran N.; Dexter, Emilie; Laptev, Ivan; Pérez, Patrick

doi:10.1007/978-3-540-88688-4_22

Cited by 166 publications

(151 citation statements)

References 26 publications

Supporting

Mentioning

141

Contrasting

Unclassified

Order By: Relevance

“…Experiments are conducted using the leave-one-out strategy followed by [28,8,25,21]. In each run, we select one actor for testing and all remaining subjects for training.…”

Section: Methodsmentioning

confidence: 99%

“…Accuracy rates obtained for an experiment aiming at only 11 actions, i.e. the 'point' action was not considered, reveals that we outperform all methods targeting this task [28,8,25] even if they considered a smaller set of subjects [8,25].…”

Section: Performancesmentioning

confidence: 96%

“…Since such projection from high dimensional space to low dimensional is multimodal, it impacts on the quality of the recognition rate [25]. Junejo et al [8] proposed to represent image sequences using self-similarity based descriptors which are fairly stable under view variation and characterises well the dynamics of the scene. However, this approach relies on the rough localisation and tracking of people in the video [8].…”

Section: Introductionmentioning

confidence: 99%

“…Junejo et al [8] proposed to represent image sequences using self-similarity based descriptors which are fairly stable under view variation and characterises well the dynamics of the scene. However, this approach relies on the rough localisation and tracking of people in the video [8]. In [28], a video is represented by a combination of 3D visual hulls with spatio-temporal volumes to build 4-dimensional action feature models.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

View and Style-Independent Action Manifolds for Human Activity Recognition

Lewandowski

Makris

Nebel

2010

Computer Vision – ECCV 2010

View full text Add to dashboard Cite

Abstract. We introduce a novel approach to automatically learn intuitive and compact descriptors of human body motions for activity recognition. Each action descriptor is produced, first, by applying Temporal Laplacian Eigenmaps to view-dependent videos in order to produce a stylistic invariant embedded manifold for each view separately. Then, all view-dependent manifolds are automatically combined to discover a unified representation which model in a single three dimensional space an action independently from style and viewpoint. In addition, a bidirectional nonlinear mapping function is incorporated to allow projecting actions between original and embedded spaces. The proposed framework is evaluated on a real and challenging dataset (IXMAS), which is composed of a variety of actions seen from arbitrary viewpoints. Experimental results demonstrate robustness against style and view variation and match the most accurate action recognition method.

show abstract

“…Experiments are conducted using the leave-one-out strategy followed by [28,8,25,21]. In each run, we select one actor for testing and all remaining subjects for training.…”

Section: Methodsmentioning

confidence: 99%

Section: Performancesmentioning

confidence: 96%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

View and Style-Independent Action Manifolds for Human Activity Recognition

Lewandowski

Makris

Nebel

2010

Computer Vision – ECCV 2010

View full text Add to dashboard Cite

show abstract

“…Sparse pairwise topologies (stars, fans, parts) often suffer from a lack of appropriately annotated training data, as they often require annotations that specify the topology for training [7,8]. Alternatively, there are structured methods which can operate without such annotations, but at the cost of significantly more complicated and computationally expensive training or testing [9,10]. In the special limited case of a fixed camera, the entire topology can be fixed relative to the frame by simply using the absolute positions of features [11,12].…”

Section: Introductionmentioning

confidence: 99%

Representing Pairwise Spatial and Temporal Relations for Action Recognition

Matikainen

Hebert

Sukthankar

2010

Computer Vision – ECCV 2010

View full text Add to dashboard Cite

Abstract. The popular bag-of-words paradigm for action recognition tasks is based on building histograms of quantized features, typically at the cost of discarding all information about relationships between them. However, although the beneficial nature of including these relationships seems obvious, in practice finding good representations for feature relationships in video is difficult. We propose a simple and computationally efficient method for expressing pairwise relationships between quantized features that combines the power of discriminative representations with key aspects of Naïve Bayes. We demonstrate how our technique can augment both appearance-and motion-based features, and that it significantly improves performance on both types of features.

show abstract