2017
DOI: 10.1109/tpami.2016.2640292
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Learning Heterogeneous Features for RGB-D Activity Recognition

Abstract: In this paper, we focus on heterogeneous features learning for RGB-D activity recognition. We find that features from different channels (RGB, depth) could share some similar hidden structures, and then propose a joint learning model to simultaneously explore the shared and feature-specific components as an instance of heterogeneous multi-task learning. The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared feat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
258
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 265 publications
(260 citation statements)
references
References 42 publications
2
258
0
Order By: Relevance
“…SYSU dataset: For the empirical evaluations, we compare our DACNN algorithm to other baselines including CNN+DPRL [28], ST-LSTM+Trust Gate [13], Dynamic Skeletons [35], LAFF(SKL) [45], SR-TSL [46], VA-LSTM [47], and GCA-LSTM [48], which includes the most recent deep learning applications (CNN, LSTM, etc.) on this dataset.…”
Section: Action Recognition Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…SYSU dataset: For the empirical evaluations, we compare our DACNN algorithm to other baselines including CNN+DPRL [28], ST-LSTM+Trust Gate [13], Dynamic Skeletons [35], LAFF(SKL) [45], SR-TSL [46], VA-LSTM [47], and GCA-LSTM [48], which includes the most recent deep learning applications (CNN, LSTM, etc.) on this dataset.…”
Section: Action Recognition Resultsmentioning
confidence: 99%
“…SYSU-3D Human-Object Interaction dataset (SYSU) [35]: This dataset contains 12 different action classes recorded in 480 video sequences. SYSU dataset is captured from 40 human subjects, and in each time-frame, it represents the 3D coordinates of 20 body joints.…”
Section: Montalbano V2 Datasetmentioning
confidence: 99%
“…An intuitive way to combine multimodal features is to directly concatenate them together . To mine more useful information among multimodal features for better performance, researchers propose to explicitly learn shared‐specific structures among features …”
Section: Related Workmentioning
confidence: 99%
“…52 To mine more useful information among multimodal features for better performance, researchers propose to explicitly learn shared-specific structures among features. 11,53 Early on, Liu and Shao 54 utilized a genetic programming framework to improve not only RGB and depth descriptors but also their fusion simultaneously through an iterative evolution. Ni et al 55 concatenated depth descriptor-and RGB-based representations of spatiotemporal interest points for better RGB+D information fusion.…”
Section: Multimodal 3d Action Recognitionmentioning
confidence: 99%
See 1 more Smart Citation