Automatic Segmentation and Recognition of Human Actions in Monocular Sequences

Orrite, Carlos; Rodriguez, Mario; Herrero, Elías Revestido; Rogez, Grégory; Velastín, Sergio A.

doi:10.1109/icpr.2014.723

Cited by 14 publications

(34 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Accuracy Singh et al [31] 61.8% Orrite et al [32] 75.0% Cheema et al [33] 75.5% Murtaza et al [34] 81.6% Our method 91.2% information gain of data split. More concretely, denotes the probability that the action recognition task is selected at each node.…”

Section: Methodsmentioning

confidence: 99%

“…Since our method takes humancentered subvolumes recorded from multiple views as input, , , } ∈cub( ),V=1: , =1: ; Predefined parameters dep max , ∈ (0, 1), ∈ (0, 1), and ∈ (0, 1); Output: Decision tree Tree , ; (1) Build a bootstrap dataset , by random sampling from with replacement; (2) Create a root node and set its depth to 1, then assign all cuboids in , to it; (3) Initialize an unsettled node queue Υ = 0 and push the root node into Υ; (4) while Υ ̸ = 0 do (5) Pop the first node in Υ; (6) if depth of is larger than dep max or cuboids assigned to belong to the same action and position then (7) Label node as a leaf, and then calculate P and Q from cuboids at node ; (8) Add a triple ( , , ) into decision tree Tree , ; (9) else (10) Initialize the feature candidate set Δ = 0; (11) if random number < then (12) Add a set of randomly selected optical flow features to Δ; (13) else (14) Add a set of randomly selected HOG3D features to Δ; (15) end if (16) if random number < then (17) Add two-dimensional temporal context features to Δ; (18) end if (19) maxgain = −∞, generate a random number ; (20) for each ∈ Δ do (21) if < then (22) Search for the corresponding threshold and compute information gain ( ) in terms of action labels of cuboids arriving at ; (23) else (24) Search for the corresponding threshold and compute information gain ( ) in terms of positions of cuboids arriving at ; (25) end if (26) if ( ) > maxgain then (27) * = , * = ; (28) end if (29) end for (30) Create left children node and right children node , set their depth to dep + 1, and assign each cuboid arriving at to or according to * and * ; then push node * and * into Υ; (31) Add a quintuple ( , , , * , * ) into decision tree Tree , ; (32) end if (33) end while (34) return Decision tree Tree , ; Algorithm 1: Construction of a decision tree.…”

Section: Experimental Settingmentioning

confidence: 99%

See 1 more Smart Citation

Learning a Mid-Level Representation for Multiview Action Recognition

Liu

Shi

et al. 2018

Advances in Multimedia

View full text Add to dashboard Cite

Recognizing human actions in videos is an active topic with broad commercial potentials. Most of the existing action recognition methods are supposed to have the same camera view during both training and testing. And thus performances of these single-view approaches may be severely influenced by the camera movement and variation of viewpoints. In this paper, we address the above problem by utilizing videos simultaneously recorded from multiple views. To this end, we propose a learning framework based on multitask random forest to exploit a discriminative mid-level representation for videos from multiple cameras. In the first step, subvolumes of continuous human-centered figures are extracted from original videos. In the next step, spatiotemporal cuboids sampled from these subvolumes are characterized by multiple low-level descriptors. Then a set of multitask random forests are built upon multiview cuboids sampled at adjacent positions and construct an integrated mid-level representation for multiview subvolumes of one action. Finally, a random forest classifier is employed to predict the action category in terms of the learned representation. Experiments conducted on the multiview IXMAS action dataset illustrate that the proposed method can effectively recognize human actions depicted in multiview videos.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Experimental Settingmentioning

confidence: 99%

Learning a Mid-Level Representation for Multiview Action Recognition

Liu

Shi

et al. 2018

Advances in Multimedia

View full text Add to dashboard Cite

show abstract

“…The second direction is based on multi-view learning during training and testing of unknown action is done based on these learned features. As in [21]- [23] no feature fusion is used therefore there is no need to have all camera views available during training stage. The advantage of their approach is that they can handle missing views of an action.…”

Section: Introductionmentioning

confidence: 99%

Multi-view Human Action Recognition Using Histograms of Oriented Gradients (HOG) Description of Motion History Images (MHIs)

Murtaza

Yousaf

Velastín

2015

2015 13th International Conference on Frontiers of Information Technology (FIT)

Self Cite

View full text Add to dashboard Cite

In this paper, a silhouette-based view-independent human action recognition scheme is proposed for multi-camera dataset. To overcome the high-dimensionality issue, incurred due to multi-camera data, the low-dimensional representation based on Motion History Image (MHI) was extracted. A single MHI is computed for each view/action video. For efficient description of MHIs Histograms of Oriented Gradients (HOG) are employed. Finally the classification of HOG based description of MHIs is based on Nearest Neighbor (NN) classifier. The proposed method does not employ feature fusion for multi-view data and therefore this method does not require a fixed number of cameras setup during training and testing stages. The proposed method is suitable for multi-view as well as single view dataset as no feature fusion is used. Experimentation results on multi-view MuHAVi-14 andMuHAVi-8 datasets give high accuracy rates of 92.65% and 99.26% respectively using Leave-One-Sequence-Out (LOSO) cross validation technique as compared to similar state-of-theart approaches. The proposed method is computationally efficient and hence suitable for real-time action recognition systems.

show abstract

“…end for 5: % after above loop 6: end for Uncut silhouette video τ (1) τ (2) τ (3) … τ (w) Generating Motion History Images (MHIs)…”

Section: B Clustering Of Mhis Into Action Proposalsmentioning

confidence: 99%

“…Most existing methods exhaustively apply an action classifier at every frame in a sliding window fashion for video segmentation [2][3][4][5][6]. These approaches are computationally expensive for the analysis of large-scale videos.…”

mentioning

confidence: 99%