2018
DOI: 10.1016/j.neucom.2018.08.066
|View full text |Cite
|
Sign up to set email alerts
|

A structured multi-feature representation for recognizing human action and interaction

Abstract: Active research has been carried out for human action recognition using 3D human skeleton joints with the release of cost-efficient RGB-D sensors. However, extracting discriminative features from noisy skeleton sequences to effectively distinguish various human action or interaction categories still remains challenging. This paper proposes a structured multi-feature representation for human action and interaction recognition. Specifically, a novel kernel enhanced bag of semantic words (BSW) is designed to repr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 49 publications
0
10
0
Order By: Relevance
“…One reason is that our method was mainly about human interaction recognition, the features of single person' actions got weakened due to the side effect of zero padding, which affected our recognition results. Hierarchical RNN [11] 59.10% 64.00% Dynamic skeletons [34] 60.23% 65.22% ST-LSTM+Trust Gate [13] 69.20% 77.70% Two-stream RNNs [24] 71.30% 79.50% STA-LSTM [30] 73.40% 81.20% Res-TCN [22] 74.30% 83.10% ST-GCN [35] 81.50% 88.30% Multiview IJTM [21] 82.96% 90.12% HCN [31] 86.50% 91.10% Proposed Method 82.53% 91.75%…”
Section: Methods Accuracymentioning
confidence: 99%
See 1 more Smart Citation
“…One reason is that our method was mainly about human interaction recognition, the features of single person' actions got weakened due to the side effect of zero padding, which affected our recognition results. Hierarchical RNN [11] 59.10% 64.00% Dynamic skeletons [34] 60.23% 65.22% ST-LSTM+Trust Gate [13] 69.20% 77.70% Two-stream RNNs [24] 71.30% 79.50% STA-LSTM [30] 73.40% 81.20% Res-TCN [22] 74.30% 83.10% ST-GCN [35] 81.50% 88.30% Multiview IJTM [21] 82.96% 90.12% HCN [31] 86.50% 91.10% Proposed Method 82.53% 91.75%…”
Section: Methods Accuracymentioning
confidence: 99%
“…For the original NTU RGB+D dataset, we transposed the original coordinate system to human-centric coordinate system. Different from [30], we always chose the first person's body center as the center of the coordinate system in order to better express the relative position between two subjects. Furthermore, coordinate transformation can eliminate the influence of different perspectives of actions.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…The FER task can be mainly divided into two categories depending on the type of input: static image FER and dynamic sequence FER. For the static recognition task [8]- [10], the semantic information is encoded according to a single image input, while for the dynamic recognition task [11], [12], the representation of hidden layers is related to the temporal relation among contiguous frames in the input facial expression sequence. In 2019, J Li et al [13] used Microsoft Kinect to collect the RGB-D dataset and used a two-stream network to implement a dynamic recognition task.…”
Section: A the Facial Expression Recognition Taskmentioning
confidence: 99%
“…In many studies, such as [21] and [22], the selection and calibration of multiple types of sensors are considered to be key issues. The key sensors consist of industrial camera, lens, light source and image acquisition card.…”
Section: B Computer Vision Systemmentioning
confidence: 99%