Sparse coding-based space-time video representation for action recognition

Fu, Yinghua; Zhang, Tao; Wang, Wenjin

doi:10.1007/s11042-016-3630-9

Cited by 15 publications

(12 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Weizmann dataset: Similar methodologies have been compared with the proposed methodology for the Weizmann dataset [12,16,17,25,26,31] and the results are shown in Figure 8(a). The proposed method achieved an accuracy of 97.9%.…”

Section: Comparison Of the Proposed Algorithm With Different Methods On Standard Datasetsmentioning

confidence: 99%

See 1 more Smart Citation

Modal Frequencies Based Human Action Recognition Using Silhouettes And Simplicial Elements

Kavimandan

Mishra

Kapoor

2022

IJE

View full text Add to dashboard Cite

Human action recognition has been a pioneer research problem among researchers. This paper proposed a new local feature descriptor in terms of modal frequency using silhouette and simplicial elements of a silhouette with the help of Finite Element Analysis (FEA). This local descriptor represents the distinctive human poses in the form of modal frequency. These modal frequencies reduce the feature dimension and represent a wide range of poses of human action. These modal frequencies are subject to the stiffness matrix of the body that is associated with the displacement. The silhouettes of the human body are used for the analysis. These silhouettes are represented into simplicial elements. The modal frequencies of silhouettes are calculated using simplicial elements. These modal frequencies of the silhouette are used as the feature vectors that are given to the Radial Basis Function-Support Vector Machine (RBF-SVM) classifier. The challenging datasets Weizmann, KTH and IXMAS are used for validation of the proposed methodology.

show abstract

Section: Comparison Of the Proposed Algorithm With Different Methods On Standard Datasetsmentioning

confidence: 99%

“…This methodology does not require background subtraction as they are established on the spatio-temporal points. Methods reported in literature [6][7][8][9][10][11][12][13][14] have used famous bag-of-words models. The main disadvantage of these methodologies is that they give only motion information but no information about the structure.…”

Section: Introductionmentioning

confidence: 99%

Modal Frequencies Based Human Action Recognition Using Silhouettes And Simplicial Elements

Kavimandan

Mishra

Kapoor

2022

IJE

View full text Add to dashboard Cite

show abstract

“…Tong et al [17] presented a new nonnegative matrix factorization with local constraint and proposed a nonnegative matrix factorization with temporal dependencies constraint; the method can achieve an accuracy of 93.96% on the KTH dataset. Fu et al [18] proposed a method that uses multi-scale volumetric video representation and adaptively selects an optimal space–time scale under which the saliency of a patch is the most significant; the method can achieve an accuracy of 94.33% on the KTH dataset. Kovashka et al [19] proposed a method that first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point; the method can achieve an accuracy of 94.53% on the KTH dataset.…”

Section: Related Workmentioning

confidence: 99%

Human action recognition based on HOIRM feature fusion and AP clustering BOW

et al. 2019

View full text Add to dashboard Cite

In this paper, we propose a human action recognition method using HOIRM (histogram of oriented interest region motion) feature fusion and a BOW (bag of words) model based on AP (affinity propagation) clustering. First, a HOIRM feature extraction method based on spatiotemporal interest points ROI is proposed. HOIRM can be regarded as a middle-level feature between local and global features. Then, HOIRM is fused with 3D HOG and 3D HOF local features using a cumulative histogram. The method further improves the robustness of local features to camera view angle and distance variations in complex scenes, which in turn improves the correct rate of action recognition. Finally, a BOW model based on AP clustering is proposed and applied to action classification. It obtains the appropriate visual dictionary capacity and achieves better clustering effect for the joint description of a variety of features. The experimental results demonstrate that by using the fused features with the proposed BOW model, the average recognition rate is 95.75% in the KTH database, and 88.25% in the UCF database, which are both higher than those by using only 3D HOG+3D HOF or HOIRM features. Moreover, the average recognition rate achieved by the proposed method in the two databases is higher than that obtained by other methods.

show abstract

“…In [23] the binarized silhouette is used to find out the trace transform to represent the global feature of action The sequence of a silhouette is represented as the cube video to model the action in [24]. The multiscale volumetric approach for action videos is used in [25,26]. The action is modeled using sparse coding of image sequences in [26].…”

Section: Introductionmentioning

confidence: 99%

“…The multiscale volumetric approach for action videos is used in [25,26]. The action is modeled using sparse coding of image sequences in [26]. The silhouette-based analysis is also used in deep learning-based methodologies [27, 28, and 29].…”

Section: Introductionmentioning

confidence: 99%

Human action recognition using descriptor based on selective finite element analysis

Kapoor

Mishra

Tripathi

2019

Journal of Electrical Engineering

View full text Add to dashboard Cite

This paper proposes a novel local descriptor evaluated from the Finite Element Analysis for human action recognition. This local descriptor represents the distinctive human poses in the form of the stiffness matrix. This stiffness matrix gives the information of motion as well as shape change of the human body while performing an action. Initially, the human body is represented in the silhouette form. Most prominent points of the silhouette are then selected. This silhouette is discretized into several finite small triangle faces (elements) where the prominent points of the boundaries are the vertices of the triangles. The stiffness matrix of each triangle is then calculated. The feature vector representing the action video frame is constructed by combining all stiffness matrices of all possible triangles. These feature vectors are given to the Radial Basis Function-Support Vector Machine (RBF-SVM) classifier. The proposed method shows its superiority over other existing state-of-the-art methods on the challenging datasets Weizmann, KTH, Ballet, and IXMAS.

show abstract

Sparse coding-based space-time video representation for action recognition

Cited by 15 publications

References 22 publications

Modal Frequencies Based Human Action Recognition Using Silhouettes And Simplicial Elements

Modal Frequencies Based Human Action Recognition Using Silhouettes And Simplicial Elements

Human action recognition based on HOIRM feature fusion and AP clustering BOW

Human action recognition using descriptor based on selective finite element analysis

Contact Info

Product

Resources

About