DeepActsNet: A deep ensemble framework combining features from face, hands, and body for action recognition

Asif, Umar; Mehta, Deval; Cavallar, Stefan von; Tang, Jianbin; Harrer, Stefan

doi:10.1016/j.patcog.2023.109484

Cited by 6 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There was a significant computational requirement in combining face and body visual features for human tracking [7], so this research proposed adding a tracking system. The proposed tracking system can use the KCF method [8], [9], or other tracking techniques [10].…”

Section: Figure 1 (A) Cases Of Face Visual Features Cannot Be Found (...mentioning

confidence: 99%

Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning

Maharani,

Machbub,

Rusmin

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Various methods are employed in computer vision applications to identify individuals, including using face recognition as a human visual feature helpful in tracking or searching for a person. However, tracking systems that rely solely on facial information encounter limitations, particularly when faced with occlusions, blurred images, or faces oriented away from the camera. Under these conditions, the system struggles to achieve accurate tracking-based face recognition. Therefore, this research addresses this issue by fusing descriptions of the face visual with body visual features. When the system cannot find the target face, the CNN+LSTM hybrid method assists in multi-feature body visual recognition, narrowing the search space and speeding up the search process. The results indicate that the combination of the CNN+LSTM method yields higher accuracy, recall, precision, and F1 scores (reaching 89.20%, 87.36%, 91.02%, and 88.43%, respectively) compared to the single CNN method (reaching 88.84%, 74.00%, 67.00%, and 69.00% respectively). However, the combination of these two visual features requires high computation. Thus, it is necessary to add a tracking system to reduce the computational load and predict the location. Furthermore, this research utilizes the Q-Learning algorithm to make optimal decisions in automatically tracking objects in dynamic environments. The system considers factors such as face and body visual features, object location, and environmental conditions to make the best decisions, aiming to enhance tracking efficiency and accuracy. Based on the conducted experiments, it is concluded that the system can adjust its actions in response to environmental changes with better outcomes. It achieves an accuracy rate of 91.5% and an average of 50 fps in five different videos, as well as a video benchmark dataset with an accuracy of 84% and an average error of 11.15 pixels. Utilizing the proposed method speeds up the search process and optimizes tracking decisions, saving time and computational resources.

show abstract

Section: Figure 1 (A) Cases Of Face Visual Features Cannot Be Found (...mentioning

confidence: 99%

Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning

Maharani,

Machbub,

Rusmin

et al. 2024

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Our motivation for action embedding comes from graph embedding, which can broadly be grouped into three main categories: factorizationbased, random walk-based, and deep learning (DL) based methods [18,19]. Perozzi et al [20] proposed a novel technique named DeepWalk for learning a latent representation of vertices in a graph network.…”

Section: Action Embeddingmentioning

confidence: 99%

“…In the last decade, CNN and GCN architectures with different variants remained popular choices for action recognition [17,18]. In this section, we survey the literature and organize it further as follows; (i) Graph-Action Embedding (ii) Self-attention Transformer (iii) Skeletonbased action recognition.…”

Section: Introductionmentioning

confidence: 99%

Transforming Spatio-Temporal Self-Attention Using Action Embedding for Skeleton-Based Action Recognition

Ahmed

Rizvi

Kanwal

2023

Preprint

View full text Add to dashboard Cite

Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition

Ahmad

Rizvi

Kanwal

2023

Journal of Visual Communication and Image Representation

View full text Add to dashboard Cite

DeepActsNet: A deep ensemble framework combining features from face, hands, and body for action recognition

Cited by 6 publications

References 15 publications

Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning

Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning

Transforming Spatio-Temporal Self-Attention Using Action Embedding for Skeleton-Based Action Recognition

Transforming spatio-temporal self-attention using action embedding for skeleton-based action recognition

Contact Info

Product

Resources

About