2020
DOI: 10.1109/access.2020.3023599
|View full text |Cite
|
Sign up to set email alerts
|

Infrared and 3D Skeleton Feature Fusion for RGB-D Action Recognition

Abstract: For skeleton-based action recognition from depth cameras, distinguishing object-related actions with similar motions is a difficult task. The other available video streams (RGB, infrared, depth) may provide additional clues, given an appropriate feature fusion strategy. We propose a modular network combining skeleton and infrared data. A pre-trained 2D convolutional neural network (CNN) is used as a pose module to extract features from skeleton data. A pre-trained 3D CNN is used as an infrared module to extrac… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 43 publications
(23 citation statements)
references
References 52 publications
0
21
0
Order By: Relevance
“…The results are shown in Table 2. We compare our method with simple-based fusion, such as BI-LSTM [21] and FUSION [22], and attention-based fusion, such as MMT-M [16] and VPN(I3D) [14]. The evaluation results show that…”
Section: Comparisons With the State-of-the-artmentioning
confidence: 99%
“…The results are shown in Table 2. We compare our method with simple-based fusion, such as BI-LSTM [21] and FUSION [22], and attention-based fusion, such as MMT-M [16] and VPN(I3D) [14]. The evaluation results show that…”
Section: Comparisons With the State-of-the-artmentioning
confidence: 99%
“…Other available video streams (RGB, infrared, depth) provide additional clues. Boissiere [ 31 ] proposed a modular network combining skeleton and infrared data. The pre-trained 2D CNN was used as a pose module to extract features from the skeleton data.…”
Section: Introductionmentioning
confidence: 99%
“…It differs from offline action detection where the sequence is studied in its entirety before temporal segments are proposed, then classified. Following the works of [8], we focus on infrared data from RGB-D cameras to further perpetuate their legitimacy as a stream candidate for action recognition and detection. Infrared from RGB-D cameras shows a strong potential for security and night-vision applications.…”
Section: Introductionmentioning
confidence: 99%
“…Infrared from RGB-D cameras shows a strong potential for security and night-vision applications. In [8], infrared yields great results for action recognition. In this work, we go one step further and use infrared as a ready-to-embed network for real-time online action detection.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation