2018
DOI: 10.1186/s13640-018-0250-5
|View full text |Cite
|
Sign up to set email alerts
|

Action recognition using length-variable edge trajectory and spatio-temporal motion skeleton descriptor

Abstract: Representing the features of different types of human action in unconstrained videos is a challenging task due to camera motion, cluttered background, and occlusions. This paper aims to obtain effective and compact action representation with length-variable edge trajectory (LV-ET) and spatio-temporal motion skeleton (STMS). First, in order to better describe the long-term motion information for action representation, a novel edge-based trajectory extracting strategy is introduced by tracking edge points from m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
2

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(7 citation statements)
references
References 40 publications
0
7
0
Order By: Relevance
“…The optimal value in the table is presented in bold. The approach here is compared with not only handcraft approaches (e.g., improved dense trajectories (IDT) [ 12 ], saliency-based trajectory (ST) [ 25 ], and spatiotemporal motion skeleton representation [ 26 ]), but also deep learning representations (e.g., 3D-CNN (C3D) [ 11 ], two-stream CNN [ 27 ], trajectory-pooled deep-convolutional descriptors (TDD) [ 30 ], motion-salient-region convolutional neural network (MSR-CNN) [ 10 ], tube convolutional neural network (T-CNN) [ 32 ], sequential trajectory texture (STT) deep representation [ 33 ], three-stream CNN [ 29 ], long-term temporal convolutional network (LTC) [ 35 ], and temporal segment network (TSN) [ 34 ] with RGB and OF modalities). Moreover, the method here is compared with many attention-based methods such as dual attention convolutional network (DANet-50) [ 39 ], spatial-temporal attention network (STA-CNN) [ 42 ], spatiotemporal attention module (SAM) [ 41 ], and unified spatiotemporal attention networks (STANs) [ 40 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The optimal value in the table is presented in bold. The approach here is compared with not only handcraft approaches (e.g., improved dense trajectories (IDT) [ 12 ], saliency-based trajectory (ST) [ 25 ], and spatiotemporal motion skeleton representation [ 26 ]), but also deep learning representations (e.g., 3D-CNN (C3D) [ 11 ], two-stream CNN [ 27 ], trajectory-pooled deep-convolutional descriptors (TDD) [ 30 ], motion-salient-region convolutional neural network (MSR-CNN) [ 10 ], tube convolutional neural network (T-CNN) [ 32 ], sequential trajectory texture (STT) deep representation [ 33 ], three-stream CNN [ 29 ], long-term temporal convolutional network (LTC) [ 35 ], and temporal segment network (TSN) [ 34 ] with RGB and OF modalities). Moreover, the method here is compared with many attention-based methods such as dual attention convolutional network (DANet-50) [ 39 ], spatial-temporal attention network (STA-CNN) [ 42 ], spatiotemporal attention module (SAM) [ 41 ], and unified spatiotemporal attention networks (STANs) [ 40 ].…”
Section: Methodsmentioning
confidence: 99%
“…In this end, Yi and Lin [ 24 ] and Xu et al [ 25 ] aimed to select dense trajectories from salient motion areas and encoded them into a compact video representation. Unlike [ 24 , 25 ] utilizing video saliency to select action-related trajectory, the authors in [ 26 ] proposed a length-variable edge trajectory extracted from edge points to model different speeds of motion. These approaches commonly focused on capturing edge, corner, and motion features through trajectories and were successful in recognizing relatively simple actions.…”
Section: Related Workmentioning
confidence: 99%
“…Yan [23] adopted a sparse algorithm to extract the spatial and temporal features of sports motion, and then used the neural network to establish sports motion recognition model. Wen [24] extracted the energy diagram and motion descriptor of sports movements, and established the sports movement recognition model by using the support vector machine. The above researches all use a deep convolutional neural network to recognize and classify sports images.…”
Section: Related Workmentioning
confidence: 99%
“…In a broad sense, the prediction task can be seen as a human activity recognition task with limited observed data. Although great progress has been made in recognizing activities in complete image sequences [4][5][6], video-based human activity prediction in the early stage is still a challenging task.…”
Section: Introductionmentioning
confidence: 99%