Deep Neural Networks in Video Human Action Recognition: A review

Wang, Zihan; Zheng, Yifan; yang, yang; Li, Yujun

doi:10.36227/techrxiv.22146914.v1

Cited by 4 publications

(2 citation statements)

References 99 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HAR is based on computer vision with RGB, skeletal, and depth input representation. Wang et al [15] surveyed on HAR based on input data can be the skeleton, RGB, RGB + D, optical flow, etc. In the study, the authors only present and analyze the ST-GCN (Spatial Temporal Graph Convolutional Networks) [16] and 3D CNNs (3D Convolutional Neural Networks) hybrid with some architecture [17,18].…”

Section: Related Workmentioning

confidence: 99%

Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study

Nguyen¹,

Nguyen²,

Scherer

et al. 2023

Sensors

View full text Add to dashboard Cite

Human activity recognition (HAR) is an important research problem in computer vision. This problem is widely applied to building applications in human–machine interactions, monitoring, etc. Especially, HAR based on the human skeleton creates intuitive applications. Therefore, determining the current results of these studies is very important in selecting solutions and developing commercial products. In this paper, we perform a full survey on using deep learning to recognize human activity based on three-dimensional (3D) human skeleton data as input. Our research is based on four types of deep learning networks for activity recognition based on extracted feature vectors: Recurrent Neural Network (RNN) using extracted activity sequence features; Convolutional Neural Network (CNN) uses feature vectors extracted based on the projection of the skeleton into the image space; Graph Convolution Network (GCN) uses features extracted from the skeleton graph and the temporal–spatial function of the skeleton; Hybrid Deep Neural Network (Hybrid–DNN) uses many other types of features in combination. Our survey research is fully implemented from models, databases, metrics, and results from 2019 to March 2023, and they are presented in ascending order of time. In particular, we also carried out a comparative study on HAR based on a 3D human skeleton on the KLHA3D 102 and KLYOGA3D datasets. At the same time, we performed analysis and discussed the obtained results when applying CNN-based, GCN-based, and Hybrid–DNN-based deep learning networks.

show abstract

Section: Related Workmentioning

confidence: 99%

Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study

Nguyen¹,

Nguyen²,

Scherer

et al. 2023

Sensors

View full text Add to dashboard Cite

show abstract

“…Inspired by the success of deep learning [14,21], the encoderdecoder framework with attention mechanisms [2] have been dominated in MWP [18][19][20], which bring the state-of-the-art to a new level. The key idea is to use an encoder to learn representations of problem text and employ a decoder to generate the corresponding solution expression and answer.…”

Section: Introductionmentioning

confidence: 99%

Expression Syntax Information Bottleneck for Math Word Problems

Xiong

Yang

et al. 2022

Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available. 1

show abstract

Improving action quality assessment with across-staged temporal reasoning on imbalanced data

Lian,

Shao

2023

Appl Intell

View full text Add to dashboard Cite

Deep Neural Networks in Video Human Action Recognition: A review

Cited by 4 publications

References 99 publications

Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study

Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study

Expression Syntax Information Bottleneck for Math Word Problems

Improving action quality assessment with across-staged temporal reasoning on imbalanced data

Contact Info

Product

Resources

About