2018
DOI: 10.1007/s00371-018-1489-7
|View full text |Cite
|
Sign up to set email alerts
|

Improving bag-of-poses with semi-temporal pose descriptors for skeleton-based action recognition

Abstract: Over the last few decades, human action recognition has become one of the most challenging tasks in the field of computer vision. Employing economical depth sensors such as Microsoft Kinect as well as recent successes of deep learning approaches in image understanding has led to

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 34 publications
(9 citation statements)
references
References 83 publications
0
9
0
Order By: Relevance
“…..50 segments. For a section, a positive label is allocated if: (1) The segment has a ground truth that overlaps with the most significant temporal intersection over Union (tIoU); or (2) For any ground reality, the section has a tIoU greater than 0.5. A section that has no overlap with any ground reality, a negative mark is assigned.…”
Section: Performance Evaluation and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…..50 segments. For a section, a positive label is allocated if: (1) The segment has a ground truth that overlaps with the most significant temporal intersection over Union (tIoU); or (2) For any ground reality, the section has a tIoU greater than 0.5. A section that has no overlap with any ground reality, a negative mark is assigned.…”
Section: Performance Evaluation and Analysismentioning
confidence: 99%
“…A video is a sequence of image frames that can be provided to a CNN network for image analysis. In still images, one can use CNN to identify the features [1], however, in videos, it is critical to capture the context between the image frames extracted from the video while labelling to avoid loss of data. Consider an image of a half-filled cardboard box that can be labelled as packing a box or unpacking a box depending on the frames before and after it.…”
Section: Introductionmentioning
confidence: 99%
“…In the past few years, researchers have examined human actions by extracting skeleton information through video sensors. In [12], the input frame sequences were used for the extraction of temporal displacement skeleton poses and k-means clustering was applied to generate key poses. Further, SVM classifier was used to classify each action pose.…”
Section: Action Recognition Via Video Sensormentioning
confidence: 99%
“…HAR techniques can be divided into three groups: vision-based, sensor-based, and WiFi-based. Several image-based methods for HAR have been published in recent years, using datasets such as RGB (red, green, and blue [18]), depth [19], and skeleton images [20]. The RGB dataset may not be qualified and robust enough in this method when the video contains considerable sudden camera movements and cluttered background.…”
Section: Related Workmentioning
confidence: 99%