Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery

Han, Peng; Razi, Abolfazl

doi:10.1007/978-3-030-64556-4_22

Cited by 12 publications

(12 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…C3D [139] has also been utilized for aerial action recognition [24], [98]. Others have also upgraded existing 2D networks such as Inception-ResNet [56], [133] with 3D convolutions to make the suitable for video processing [109]. Mou et al experimented multiple 3D CNNs, i.e.…”

Section: Two-stream Cnnsmentioning

confidence: 99%

The State of Aerial Surveillance: A Survey

Nguyen¹,

Fookes²,

Sridharan³

et al. 2022

Preprint

View full text Add to dashboard Cite

The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs and other airborne platforms. The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed. More specifically, for each of these four tasks, we first discuss unique challenges in performing these tasks in an aerial setting compared to a ground-based setting. We then review and analyze the aerial datasets publicly available for each task, and delve deep into the approaches in the aerial literature and investigate how they presently address the aerial challenges. We conclude the paper with discussion on the missing gaps and open research questions to inform future research avenues.

show abstract

Section: Two-stream Cnnsmentioning

confidence: 99%

The State of Aerial Surveillance: A Survey

Nguyen¹,

Fookes²,

Sridharan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The availability of UAV datasets [16], [18], [19] has fostered aerial video research [20] pertaining to person reidentification [21], human detection [22], tracking [23], [24], pose estimation [25], few-shot learning [26], drone detection [27] and path planning [28]. Many architectures have been proposed to specifically tackle aerial video action recognition [29], [30], [31], [32], in addition to generic action recognition [2], [3]. Recently, FAR [15] proposed a frequency-based method to disentangle moving objects by modulating feature maps.…”

Section: Related Work A) Aerial Video Recognitionmentioning

confidence: 99%

Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

Kothandaraman¹,

Ma²,

Manocha³

2022

Preprint

View full text Add to dashboard Cite

We present a learning algorithm for human activity recognition in videos. Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras that contain a human actor along with background motion. Typically, the human actors occupy less than one-tenth of the spatial resolution. Our approach simultaneously harnesses the benefits of frequency domain representations, a classical analysis tool in signal processing, and data driven neural networks. We build a differentiable staticdynamic frequency mask prior to model the salient static and dynamic pixels in the video, crucial for the underlying task of action recognition. We use this differentiable mask prior to enable the neural network to intrinsically learn disentangled feature representations via an identity loss function. Our formulation empowers the network to inherently compute disentangled salient features within its layers. Further, we propose a cost-function encapsulating temporal relevance and spatial content to sample the most important frame within uniformly spaced video segments. We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset and demonstrate relative improvements of 5.72% − 13.00% over the state-of-the-art and 14.28% − 38.05% over the corresponding baseline model.

show abstract

“…Human detection and activity recognition (HDAR) using the highly challenging UCF-ARG aerial dataset has been done using various methods [8,77,83,84,85]. The combination of "The Fastest Pedestrian Detector in the West" (FPDW) [114] and moving object detection was utilized for human detection and tracking in UAV-based videos [83].…”

Section: Introductionmentioning

confidence: 99%

“…The SVM classifier served as the activity detector [84]. Lastly, an automated UAVbased DL algorithm consisted of video stabilization using the surf feature selection and Lucas-Kanade method, human area detection using faster R-CNN, and action recognition using a structure combining a three-dimensional CNN architecture and a residual network [85]. To address limitations encountered by methods described in the literature, we propose the use of EfficientDet-D7 which was the top stateof-the-art detector for human detection to improve detection accuracy, and thus, classification accuracy.…”

Section: Introductionmentioning

confidence: 99%

A Comparison Between Various Human Detectors and CNN-Based Feature Extractors for Human Activity Recognition via Aerial Captured Video Sequences

et al. 2022

View full text Add to dashboard Cite

Human detection and activity recognition (HDAR) in videos plays an important role in various real-life applications. Recently, object detection methods such as "you only look once" (YOLO), faster region based convolutional neural network (R-CNN), and EfficientDet have been used to detect humans in videos for subsequent decision-making applications. This paper aims to address the problem of human detection in aerial captured video sequences using a moving camera attached to an aerial platform with dynamical events such as varied altitudes, illumination changes, camera jitter, and variations in viewpoints, object sizes and colors. Unlike traditional datasets that have frames captured by a static ground camera with medium or large regions of humans in these frames, the UCF-ARG aerial dataset is more challenging because it contains videos with large distances between the humans in the frames and the camera. The performance of human detection methods that have been described in the literature are often degraded when input video frames are distorted by noise, blur, illumination changes, and the like. To address these limitations, the object detection methods used in this study were trained on the COCO dataset and evaluated on the publicly available UCF-ARG dataset. The comparison between these detectors was done in terms of detection accuracy. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that EfficientDetD7 was able to outperform other detectors with 92.9% average accuracy in detecting all activities and various conditions including blurring, addition of Gaussian noise, lightening, and darkening. Additionally, deep pre-trained convolutional neural networks (CNNs) such as ResNet and EfficientNet were used to transfer learning from the ImageNet dataset to the UCF-ARG dataset and to extract highly informative features from the detected and cropped human patches. The extracted spatial features were utilized by Long Short-Term Memory (LSTM) to consider temporal relations between features for human activity recognition (HAR). Experimental results found that the EfficientNetB7-LSTM was able to outperform existing HAR methods in terms of average accuracy (80%), average precision (83%), average recall (80%), average F1 score (80%), average false negative rate (FNR) (20%), average false positive rate (FPR) (4.8%), and average Area Under Curve (AUC) (94%). The outcome is a robust HAR system which combines EfficientDetD7, and EfficientNetB7 with LSTM for human detection and activity classification, respectively.

show abstract

Fully Autonomous UAV-Based Action Recognition System Using Aerial Imagery

Cited by 12 publications

References 21 publications

The State of Aerial Surveillance: A Survey

The State of Aerial Surveillance: A Survey

Differentiable Frequency-based Disentanglement for Aerial Video Action Recognition

A Comparison Between Various Human Detectors and CNN-Based Feature Extractors for Human Activity Recognition via Aerial Captured Video Sequences

Contact Info

Product

Resources

About