Efficient Human Action Recognition Interface for Augmented and Virtual Reality Applications Based on Binary Descriptor

Fangbemi, Abassin Sourou; Liu, Bin; Yu, Neng Hai; Zhang, Yanxiang

doi:10.1007/978-3-319-95270-3_21

Cited by 17 publications

(17 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Due to the growing demand for automatic interpretation of human action, action recognition has caught the attention in both academia and industry [16]. Analyzing and understanding a person's action in a video is necessary for a wide range of applications, such as web video classification [17], assisting the visually impaired [8], surveillance and security.…”

Section: Action Recognitionmentioning

confidence: 99%

“…We have At the time that a GoF arrives, the mobile device executes Eqs. ( 5) to (8), then selects the offloading decisions that can minimize the objective function of (mod-LOP).…”

Section: Stoi(t) Op(t) ∈ {0 1}mentioning

confidence: 99%

“…In this paper, we focus on how to enable the mobile device to efficiently recognize human actions in videos. One common use case is to assist the people with visual impairment, such as the APP named Seeing AI developed by Microsoft [8]. In this scenario, the visually impaired uses a carry-on mobile device with a camera to record videos.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Deep action: A mobile action recognition framework using edge offloading

Zhang

Duan

et al. 2021

Peer-to-Peer Netw. Appl.

View full text Add to dashboard Cite

Recording users' lives as short-form videos has been an emerging trend with the advance of mobile devices. The videos contain a wealth of information that requires a significant amount of computation to retrieve. In this paper, we propose Deep action, a framework that leverages edge offloading to enable human actions recognition on mobile devices. Deep action first samples frames from a video according to the accuracy requirement. The sampled frames are then compressed and fed into deep learning models to generate an action label. Considering the varying conditions of the wireless connection, we design an online scheduler to strategically offload compressed video snippets to the edge server. Furthermore, we use OpenCL to implement the video compression-related operations on mobile GPU, such that the model inference and video compression can operate in parallel on the mobile device. We implement Deep action on the Android OS and evaluate it on a commercial off-the-shelf mobile device and an edge server. The performance evaluation demonstrates that Deep action brings up to 19 × and 13 × execution speedup, compared to the local-only and remote-only strategies, respectively.

show abstract

Section: Action Recognitionmentioning

confidence: 99%

“…We have At the time that a GoF arrives, the mobile device executes Eqs. ( 5) to (8), then selects the offloading decisions that can minimize the objective function of (mod-LOP).…”

Section: Stoi(t) Op(t) ∈ {0 1}mentioning

confidence: 99%

See 1 more Smart Citation

Deep action: A mobile action recognition framework using edge offloading

Zhang

Duan

et al. 2021

Peer-to-Peer Netw. Appl.

View full text Add to dashboard Cite

show abstract

“…HAR has a high significance in a wide range of applications. Fields like video surveillance [1], [2], virtual reality [3], [4], intelligent human-computer interface [5], and identity recognition [6] have benefited from HAR.…”

Section: Introductionmentioning

confidence: 99%

Human Action Recognition Based on Transfer Learning Approach

et al. 2021

View full text Add to dashboard Cite

Human action recognition techniques have gained significant attention among nextgeneration technologies due to their specific features and high capability to inspect video sequences to understand human actions. As a result, many fields have benefited from human action recognition techniques. Deep learning techniques played a primary role in many approaches to human action recognition. The new era of learning is spreading by transfer learning. Accordingly, this study's main objective is to propose a framework with three main phases for human action recognition. The phases are pre-training, preprocessing, and recognition. This framework presents a set of novel techniques that are three-fold as follows, (i) in the pre-training phase, a standard convolutional neural network is trained on a generic dataset to adjust weights; (ii) to perform the recognition process, this pre-trained model is then applied to the target dataset; and (iii) the recognition phase exploits convolutional neural network and long short-term memory to apply five different architectures. Three architectures are stand-alone and single-stream, while the other two are combinations between the first three in two-stream style. Experimental results show that the first three architectures recorded accuracies of 83.24%, 90.72%, and 90.85%, respectively. The last two architectures achieved accuracies of 93.48% and 94.87%, respectively. Moreover, The recorded results outperform other state-of-the-art models in the same field. INDEX TERMSConvolutional neural network (CNN), Human action recognition (HAR), Long short-term memory (LSTM), Spatiotemporal info, Transfer learning (TL).

show abstract

“…Human action recognition is an interdisciplinary research direction in the field of computer vision, involving image processing, computer vision, pattern recognition, machine learning, and artificial intelligence. With the rapid development of digital image processing technology and intelligent hardware manufacturing technology, human action recognition has wide application prospects in intelligent video monitoring [1][2][3][4], natural human computer interaction [5,6], smart home products [7][8][9], and virtual reality [10]. e popularity of human action recognition has led to several survey articles that have appeared in refs [11][12][13][14][15].…”

Section: Introductionmentioning

confidence: 99%

Using a Multilearner to Fuse Multimodal Features for Human Action Recognition

Tang

Wang

et al. 2020

Mathematical Problems in Engineering

View full text Add to dashboard Cite

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.

show abstract

Efficient Human Action Recognition Interface for Augmented and Virtual Reality Applications Based on Binary Descriptor

Cited by 17 publications

References 10 publications

Deep action: A mobile action recognition framework using edge offloading

Deep action: A mobile action recognition framework using edge offloading

Human Action Recognition Based on Transfer Learning Approach

Using a Multilearner to Fuse Multimodal Features for Human Action Recognition

Contact Info

Product

Resources

About