2014
DOI: 10.1109/tip.2013.2291319
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Geometric-Temporal Context With Directional Pyramid Co-Occurrence for Action Recognition

Abstract: In this paper, we present a new geometric-temporal representation for visual action recognition based on local spatio-temporal features. First, we propose a modified covariance descriptor under the log-Euclidean Riemannian metric to represent the spatiotemporal cuboids detected in the video sequences. Compared with previously proposed covariance descriptor, our descriptor can be measured and clustered in the Euclidian space. Second, to capture the geometric-temporal contextual information, we construct a Direc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(3 citation statements)
references
References 51 publications
(122 reference statements)
0
3
0
Order By: Relevance
“…The action recognition using handcrafted features descriptors such as extended SURF [22], HOG-3D [23], and some other shape and motion based features descriptors [24][25][26][27][28] have achieved remarkable performance for human action recognition. However, these approaches have several limitations: Handcrafted feature-based techniques require expert designed feature detectors, descriptors, and vocabulary building methods for feature extraction and representation.…”
Section: Related Workmentioning
confidence: 99%
“…The action recognition using handcrafted features descriptors such as extended SURF [22], HOG-3D [23], and some other shape and motion based features descriptors [24][25][26][27][28] have achieved remarkable performance for human action recognition. However, these approaches have several limitations: Handcrafted feature-based techniques require expert designed feature detectors, descriptors, and vocabulary building methods for feature extraction and representation.…”
Section: Related Workmentioning
confidence: 99%
“…The spatial-temporal saliency is computed from the moving parts and the local orientation is determined. These local representations are converted into global features by computing the weighted average of each point inside the bounding box and analyzing the different geometrical properties [ 32 , 33 ].…”
Section: Introductionmentioning
confidence: 99%
“…In the sequential method, the temporal features such as appearance and pose are obtained from the hidden Markov model [ 54 56 ], conditional random fields [ 57 – 60 ], and structured support vector machine [ 61 64 ]. Furthermore, representative key poses are learned for efficient representation of human actions [ 33 , 34 , 65 72 ] to build a compact pose sequence.…”
Section: Introductionmentioning
confidence: 99%