Binary Dense SIFT Flow Based Position-Information Added Two-Stream CNN for Pedestrian Action Recognition

Park, Sang Kyoo; Chung, Jun Ho; Pae, Dong Sung; Lim, Myo Taeg

doi:10.3390/app122010445

Cited by 5 publications

(1 citation statement)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has a wide range of applications in real life. For example, human action recognition can be used for home monitoring to monitor the behavioral activities of the elderly and to detect dangerous actions such as falls in a timely manner [1], and it can help an automatic navigation system analyze and predict the action of pedestrians [2]. Commonly used inputs for human action recognition algorithms include RGB images and videos [3], skeleton [4], depth [5], point-cloud [6], and so on.…”

Section: Introductionmentioning

confidence: 99%

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

et al. 2023

View full text Add to dashboard Cite

Due to the great success of Vision Transformer (ViT) in image classification tasks, many pure Transformer architectures for human action recognition have been proposed. However, very few works have attempted to use Transformer to conduct bimodal action recognition, i.e., both skeleton and RGB modalities for action recognition. As proved in many previous works, RGB modality and skeleton modality are complementary to each other in human action recognition tasks. How to use both RGB and skeleton modalities for action recognition in a Transformer-based framework is a challenge. In this paper, we propose RGBSformer, a novel two-stream pure Transformer-based framework for human action recognition using both RGB and skeleton modalities. Using only RGB videos, we can acquire skeleton data and generate corresponding skeleton heatmaps. Then, we input skeleton heatmaps and RGB frames to Transformer at different temporal and spatial resolutions. Because the skeleton heatmaps are primary features compared to the original RGB frames, we use fewer attention layers in the skeleton stream. At the same time, two ways are proposed to fuse the information of two streams. Experiments demonstrate that the proposed framework achieves the state of the art on four benchmarks: three widely used datasets, Kinetics400, NTU RGB+D 60, and NTU RGB+D 120, and the fine-grained dataset FineGym99.

show abstract

Section: Introductionmentioning

confidence: 99%

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

et al. 2023

View full text Add to dashboard Cite

show abstract

RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification

Gazzeh,

Lo Presti,

Douik

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

A survey of video-based human action recognition in team sports

Yin,

Sinnott,

Jayaputera

2024

Artif Intell Rev

View full text Add to dashboard Cite

Over the past few decades, numerous studies have focused on identifying and recognizing human actions using machine learning and computer vision techniques. Video-based human action recognition (HAR) aims to detect actions from video sequences automatically. This can cover simple gestures to complex actions involving multiple people interacting with objects. Actions in team sports exhibit a different nature compared to other sports, since they tend to occur at a faster pace and involve more human-human interactions. As a result, research has typically not focused on the challenges of HAR in team sports. This paper comprehensively summarises HAR-related research and applications with specific focus on team sports such as football (soccer), basketball and Australian rules football. Key datasets used for HAR-related team sports research are explored. Finally, common challenges and future work are discussed, and possible research directions identified.

show abstract

Binary Dense SIFT Flow Based Position-Information Added Two-Stream CNN for Pedestrian Action Recognition

Cited by 5 publications

References 49 publications

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

A Novel Two-Stream Transformer-Based Framework for Multi-Modality Human Action Recognition

RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification

A survey of video-based human action recognition in team sports

Contact Info

Product

Resources

About