Proceedings of the 26th ACM International Conference on Multimedia 2018
DOI: 10.1145/3240508.3240675
|View full text |Cite
|
Sign up to set email alerts
|

A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

Abstract: Current researches of action recognition mainly focus on single-view and multi-view recognition, which can hardly satisfies the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of datasets also sets up barriers. To provide data for arbitraryview action recognition, we newly collect a large-scale RGB-D action dataset for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The dataset includes action samples captured in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
42
2

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 64 publications
(45 citation statements)
references
References 44 publications
1
42
2
Order By: Relevance
“…The size of NTU allows training deep neural networks unlike previous datasets. Very recently, Ji et al (Ji et al 2018) collected the first large-scale dataset, UESTC, that has a 360 • coverage around the performer, although still in a lab setting.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The size of NTU allows training deep neural networks unlike previous datasets. Very recently, Ji et al (Ji et al 2018) collected the first large-scale dataset, UESTC, that has a 360 • coverage around the performer, although still in a lab setting.…”
Section: Related Workmentioning
confidence: 99%
“…UESTC is a recent dataset (Ji et al 2018) that systematically collects 8 equally separated viewpoints that cover 360 • around a person (see Fig. 5).…”
Section: Uestc Rgb-d Varying-view 3d Action Dataset (Uestc)mentioning
confidence: 99%
See 1 more Smart Citation
“…In Reference [ 29 ], the authors generated synthetic multi-view video sequence from one view, and then trained a 3D ResNet-50 [ 40 ] on both synthetic and real data to classify actions. Among these methods, Varol et al [ 29 ] is the only work that provides cross-view evaluation through single-view training, resulting in accuracy on the UESTC dataset [ 46 ], which then was increased to when they used additional synthetic multi-view data for training.…”
Section: Related Workmentioning
confidence: 99%
“…The contextual features were detected using the CNN model. In addition, Yanliet al, 2018 [ 34 ] proposed a View-guided Skeleton-CNN (VS-CNN) model for human arbitrary view and human action recognition which carries on weakening view differences by visualizing skeleton sequences and covers a larger range of view angles. Hug et al, 2019 [ 35 ] applied an action recognition model based on the transformation of the skeleton to a spatial presentation using the conversion of the distance values of two joints to color points, and they used the DenseNet CNN model for action classification.…”
Section: Related Workmentioning
confidence: 99%